{
  "cells": [
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Linking banking transactions\n",
        "\n",
        "This example shows how to perform a one-to-one link on banking transactions.\n",
        "\n",
        "The data is fake data, and was generated has the following features:\n",
        "\n",
        "- Money shows up in the destination account with some time delay\n",
        "- The amount sent and the amount received are not always the same - there are hidden fees and foreign exchange effects\n",
        "- The memo is sometimes truncated and content is sometimes missing\n",
        "\n",
        "Since each origin payment should end up in the destination account, the `probability_two_random_records_match` of the model is known.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a target=\"_blank\" href=\"https://colab.research.google.com/github/moj-analytical-services/splink/blob/master/docs/demos/examples/duckdb/transactions.ipynb\">\n",
        "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
        "</a>\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-06-07T09:22:27.648457Z",
          "iopub.status.busy": "2024-06-07T09:22:27.648128Z",
          "iopub.status.idle": "2024-06-07T09:22:27.653498Z",
          "shell.execute_reply": "2024-06-07T09:22:27.652626Z"
        },
        "tags": [
          "hide_input"
        ]
      },
      "outputs": [],
      "source": [
        "# Uncomment and run this cell if you're running in Google Colab.\n",
        "# !pip install splink"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-06-07T09:22:27.657230Z",
          "iopub.status.busy": "2024-06-07T09:22:27.656926Z",
          "iopub.status.idle": "2024-06-07T09:22:31.983888Z",
          "shell.execute_reply": "2024-06-07T09:22:31.983040Z"
        }
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>ground_truth</th>\n",
              "      <th>memo</th>\n",
              "      <th>transaction_date</th>\n",
              "      <th>amount</th>\n",
              "      <th>unique_id</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>0</td>\n",
              "      <td>MATTHIAS C paym</td>\n",
              "      <td>2022-03-28</td>\n",
              "      <td>36.36</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>1</td>\n",
              "      <td>M CORVINUS dona</td>\n",
              "      <td>2022-02-14</td>\n",
              "      <td>221.91</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   ground_truth             memo transaction_date  amount  unique_id\n",
              "0             0  MATTHIAS C paym       2022-03-28   36.36          0\n",
              "1             1  M CORVINUS dona       2022-02-14  221.91          1"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>ground_truth</th>\n",
              "      <th>memo</th>\n",
              "      <th>transaction_date</th>\n",
              "      <th>amount</th>\n",
              "      <th>unique_id</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>0</td>\n",
              "      <td>MATTHIAS C payment BGC</td>\n",
              "      <td>2022-03-29</td>\n",
              "      <td>36.36</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>1</td>\n",
              "      <td>M CORVINUS BGC</td>\n",
              "      <td>2022-02-16</td>\n",
              "      <td>221.91</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   ground_truth                    memo transaction_date  amount  unique_id\n",
              "0             0  MATTHIAS C payment BGC       2022-03-29   36.36          0\n",
              "1             1          M CORVINUS BGC       2022-02-16  221.91          1"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ],
      "source": [
        "from splink import DuckDBAPI, Linker, SettingsCreator, block_on, splink_datasets\n",
        "\n",
        "df_origin = splink_datasets.transactions_origin\n",
        "df_destination = splink_datasets.transactions_destination\n",
        "\n",
        "display(df_origin.head(2))\n",
        "display(df_destination.head(2))"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "In the following chart, we can see this is a challenging dataset to link:\n",
        "\n",
        "- There are only 151 distinct transaction dates, with strong skew\n",
        "- Some 'memos' are used multiple times (up to 48 times)\n",
        "- There is strong skew in the 'amount' column, with 1,400 transactions of around 60.00\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-06-07T09:22:31.987843Z",
          "iopub.status.busy": "2024-06-07T09:22:31.987459Z",
          "iopub.status.idle": "2024-06-07T09:22:32.720064Z",
          "shell.execute_reply": "2024-06-07T09:22:32.719389Z"
        }
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "\n",
              "<style>\n",
              "  #altair-viz-b63b243eaf2f4ad48e06e91effab406a.vega-embed {\n",
              "    width: 100%;\n",
              "    display: flex;\n",
              "  }\n",
              "\n",
              "  #altair-viz-b63b243eaf2f4ad48e06e91effab406a.vega-embed details,\n",
              "  #altair-viz-b63b243eaf2f4ad48e06e91effab406a.vega-embed details summary {\n",
              "    position: relative;\n",
              "  }\n",
              "</style>\n",
              "<div id=\"altair-viz-b63b243eaf2f4ad48e06e91effab406a\"></div>\n",
              "<script type=\"text/javascript\">\n",
              "  var VEGA_DEBUG = (typeof VEGA_DEBUG == \"undefined\") ? {} : VEGA_DEBUG;\n",
              "  (function(spec, embedOpt){\n",
              "    let outputDiv = document.currentScript.previousElementSibling;\n",
              "    if (outputDiv.id !== \"altair-viz-b63b243eaf2f4ad48e06e91effab406a\") {\n",
              "      outputDiv = document.getElementById(\"altair-viz-b63b243eaf2f4ad48e06e91effab406a\");\n",
              "    }\n",
              "    const paths = {\n",
              "      \"vega\": \"https://cdn.jsdelivr.net/npm/vega@5?noext\",\n",
              "      \"vega-lib\": \"https://cdn.jsdelivr.net/npm/vega-lib?noext\",\n",
              "      \"vega-lite\": \"https://cdn.jsdelivr.net/npm/vega-lite@5.17.0?noext\",\n",
              "      \"vega-embed\": \"https://cdn.jsdelivr.net/npm/vega-embed@6?noext\",\n",
              "    };\n",
              "\n",
              "    function maybeLoadScript(lib, version) {\n",
              "      var key = `${lib.replace(\"-\", \"\")}_version`;\n",
              "      return (VEGA_DEBUG[key] == version) ?\n",
              "        Promise.resolve(paths[lib]) :\n",
              "        new Promise(function(resolve, reject) {\n",
              "          var s = document.createElement('script');\n",
              "          document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
              "          s.async = true;\n",
              "          s.onload = () => {\n",
              "            VEGA_DEBUG[key] = version;\n",
              "            return resolve(paths[lib]);\n",
              "          };\n",
              "          s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
              "          s.src = paths[lib];\n",
              "        });\n",
              "    }\n",
              "\n",
              "    function showError(err) {\n",
              "      outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
              "      throw err;\n",
              "    }\n",
              "\n",
              "    function displayChart(vegaEmbed) {\n",
              "      vegaEmbed(outputDiv, spec, embedOpt)\n",
              "        .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
              "    }\n",
              "\n",
              "    if(typeof define === \"function\" && define.amd) {\n",
              "      requirejs.config({paths});\n",
              "      require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
              "    } else {\n",
              "      maybeLoadScript(\"vega\", \"5\")\n",
              "        .then(() => maybeLoadScript(\"vega-lite\", \"5.17.0\"))\n",
              "        .then(() => maybeLoadScript(\"vega-embed\", \"6\"))\n",
              "        .catch(showError)\n",
              "        .then(() => displayChart(vegaEmbed));\n",
              "    }\n",
              "  })({\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"vconcat\": [{\"hconcat\": [{\"mark\": {\"type\": \"line\", \"interpolate\": \"step-after\"}, \"data\": {\"values\": [{\"percentile_ex_nulls\": 0.9994705319404602, \"percentile_inc_nulls\": 0.9994705319404602, \"value_count\": 48, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 48.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.999029278755188, \"percentile_inc_nulls\": 0.999029278755188, \"value_count\": 40, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 40.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9986211061477661, \"percentile_inc_nulls\": 0.9986211061477661, \"value_count\": 37, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 37.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9982239603996277, \"percentile_inc_nulls\": 0.9982239603996277, \"value_count\": 36, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 36.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9978379011154175, \"percentile_inc_nulls\": 0.9978379011154175, \"value_count\": 35, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 35.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9974738359451294, \"percentile_inc_nulls\": 0.9974738359451294, \"value_count\": 33, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 33.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9971208572387695, \"percentile_inc_nulls\": 0.9971208572387695, \"value_count\": 32, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 32.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.99680095911026, \"percentile_inc_nulls\": 0.99680095911026, \"value_count\": 29, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 29.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9964920878410339, \"percentile_inc_nulls\": 0.9964920878410339, \"value_count\": 28, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 28.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9961942434310913, \"percentile_inc_nulls\": 0.9961942434310913, \"value_count\": 27, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 27.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9953337907791138, \"percentile_inc_nulls\": 0.9953337907791138, \"value_count\": 26, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 78.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.995058000087738, \"percentile_inc_nulls\": 0.995058000087738, \"value_count\": 25, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 25.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9945285320281982, \"percentile_inc_nulls\": 0.9945285320281982, \"value_count\": 24, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 48.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9930062294006348, \"percentile_inc_nulls\": 0.9930062294006348, \"value_count\": 23, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 138.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9925208687782288, \"percentile_inc_nulls\": 0.9925208687782288, \"value_count\": 22, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 44.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9908992648124695, \"percentile_inc_nulls\": 0.9908992648124695, \"value_count\": 21, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 147.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9897961616516113, \"percentile_inc_nulls\": 0.9897961616516113, \"value_count\": 20, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 100.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9883289933204651, \"percentile_inc_nulls\": 0.9883289933204651, \"value_count\": 19, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 133.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9859462380409241, \"percentile_inc_nulls\": 0.9859462380409241, \"value_count\": 18, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 216.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9842584729194641, \"percentile_inc_nulls\": 0.9842584729194641, \"value_count\": 17, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 153.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9803755283355713, \"percentile_inc_nulls\": 0.9803755283355713, \"value_count\": 16, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 352.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9772316217422485, \"percentile_inc_nulls\": 0.9772316217422485, \"value_count\": 15, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 285.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.975223958492279, \"percentile_inc_nulls\": 0.975223958492279, \"value_count\": 14, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 182.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9713519811630249, \"percentile_inc_nulls\": 0.9713519811630249, \"value_count\": 13, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 351.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9661893844604492, \"percentile_inc_nulls\": 0.9661893844604492, \"value_count\": 12, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 468.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9600008726119995, \"percentile_inc_nulls\": 0.9600008726119995, \"value_count\": 11, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 561.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9508450031280518, \"percentile_inc_nulls\": 0.9508450031280518, \"value_count\": 10, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 830.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9396262764930725, \"percentile_inc_nulls\": 0.9396262764930725, \"value_count\": 9, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1017.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9244473576545715, \"percentile_inc_nulls\": 0.9244473576545715, \"value_count\": 8, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1376.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.9059921503067017, \"percentile_inc_nulls\": 0.9059921503067017, \"value_count\": 7, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1673.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.8743546605110168, \"percentile_inc_nulls\": 0.8743546605110168, \"value_count\": 6, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 2868.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.8320500254631042, \"percentile_inc_nulls\": 0.8320500254631042, \"value_count\": 5, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 3835.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.7496249675750732, \"percentile_inc_nulls\": 0.7496249675750732, \"value_count\": 4, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 7472.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.6540506482124329, \"percentile_inc_nulls\": 0.6540506482124329, \"value_count\": 3, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 8664.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.35420066118240356, \"percentile_inc_nulls\": 0.35420066118240356, \"value_count\": 2, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 27182.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 0.0, \"percentile_inc_nulls\": 0.0, \"value_count\": 1, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 32109.0, \"distinct_value_count\": 52543}, {\"percentile_ex_nulls\": 1.0, \"percentile_inc_nulls\": 1.0, \"value_count\": 48, \"group_name\": \"memo\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 48.0, \"distinct_value_count\": 52543}]}, \"encoding\": {\"tooltip\": [{\"field\": \"value_count\", \"type\": \"quantitative\"}, {\"field\": \"percentile_ex_nulls\", \"type\": \"quantitative\"}, {\"field\": \"percentile_inc_nulls\", \"type\": \"quantitative\"}, {\"field\": \"total_non_null_rows\", \"type\": \"quantitative\"}, {\"field\": \"total_rows_inc_nulls\", \"type\": \"quantitative\"}], \"x\": {\"field\": \"percentile_ex_nulls\", \"sort\": \"descending\", \"title\": \"Percentile\", \"type\": \"quantitative\"}, \"y\": {\"field\": \"value_count\", \"title\": \"Count of values\", \"type\": \"quantitative\"}}, \"title\": {\"text\": \"Distribution of counts of values in column memo\", \"subtitle\": \"In this col, 0 values (0.0%) are null and there are 52543 distinct values\"}}, {\"mark\": \"bar\", \"data\": {\"values\": [{\"value_count\": 48, \"group_name\": \"memo\", \"value\": \"J B BGC\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 52543}, {\"value_count\": 40, \"group_name\": \"memo\", \"value\": \"J B payment BGC\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 52543}, {\"value_count\": 37, \"group_name\": \"memo\", \"value\": \"J B\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 52543}, {\"value_count\": 36, \"group_name\": \"memo\", \"value\": \"J B money BGC\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 52543}, {\"value_count\": 35, \"group_name\": \"memo\", \"value\": \"J B donation BG\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 52543}, {\"value_count\": 33, \"group_name\": \"memo\", \"value\": \"J B  BGC\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 52543}, {\"value_count\": 32, \"group_name\": \"memo\", \"value\": \"J S money BGC\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 52543}, {\"value_count\": 29, \"group_name\": \"memo\", \"value\": \"A B BGC\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 52543}, {\"value_count\": 28, \"group_name\": \"memo\", \"value\": \"A B money BGC\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 52543}, {\"value_count\": 27, \"group_name\": \"memo\", \"value\": \"J C money BGC\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 52543}]}, \"encoding\": {\"tooltip\": [{\"field\": \"value\", \"type\": \"nominal\"}, {\"field\": \"value_count\", \"type\": \"quantitative\"}, {\"field\": \"total_non_null_rows\", \"type\": \"quantitative\"}, {\"field\": \"total_rows_inc_nulls\", \"type\": \"quantitative\"}], \"x\": {\"field\": \"value\", \"sort\": \"-y\", \"title\": null, \"type\": \"nominal\"}, \"y\": {\"field\": \"value_count\", \"title\": \"Value count\", \"type\": \"quantitative\"}}, \"title\": \"Top 10 values by value count\"}, {\"mark\": \"bar\", \"data\": {\"values\": [{\"value_count\": 1, \"group_name\": \"memo\", \"value\": \"M CORVINUS  BGC\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 52543}, {\"value_count\": 1, \"group_name\": \"memo\", \"value\": \"M CORVINUS  CSH\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 52543}, {\"value_count\": 1, \"group_name\": \"memo\", \"value\": \"MATTHIAS C paym\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 52543}, {\"value_count\": 1, \"group_name\": \"memo\", \"value\": \"M CORVINUS CSH\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 52543}, {\"value_count\": 1, \"group_name\": \"memo\", \"value\": \"LORENZO D MEDICI donation CSH\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 52543}, {\"value_count\": 1, \"group_name\": \"memo\", \"value\": \"A D THE ELDER d\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 52543}, {\"value_count\": 1, \"group_name\": \"memo\", \"value\": \"A C 909375fb BG\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 52543}, {\"value_count\": 1, \"group_name\": \"memo\", \"value\": \"AIKATERINI C money CSH\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 52543}, {\"value_count\": 1, \"group_name\": \"memo\", \"value\": \"AIKATERINI C money CHQ\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 52543}, {\"value_count\": 1, \"group_name\": \"memo\", \"value\": \"AIKATERINI C  C\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 52543}]}, \"encoding\": {\"tooltip\": [{\"field\": \"value\", \"type\": \"nominal\"}, {\"field\": \"value_count\", \"type\": \"quantitative\"}, {\"field\": \"total_non_null_rows\", \"type\": \"quantitative\"}, {\"field\": \"total_rows_inc_nulls\", \"type\": \"quantitative\"}], \"x\": {\"field\": \"value\", \"sort\": \"-y\", \"title\": null, \"type\": \"nominal\"}, \"y\": {\"field\": \"value_count\", \"scale\": {\"domain\": [0, 48]}, \"title\": \"Value count\", \"type\": \"quantitative\"}}, \"title\": \"Bottom 10 values by value count\"}]}, {\"hconcat\": [{\"mark\": {\"type\": \"line\", \"interpolate\": \"step-after\"}, \"data\": {\"values\": [{\"percentile_ex_nulls\": 0.31922078132629395, \"percentile_inc_nulls\": 0.31922078132629395, \"value_count\": 758, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 758.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.31094735860824585, \"percentile_inc_nulls\": 0.31094735860824585, \"value_count\": 750, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 750.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.30285048484802246, \"percentile_inc_nulls\": 0.30285048484802246, \"value_count\": 734, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 734.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.2948087453842163, \"percentile_inc_nulls\": 0.2948087453842163, \"value_count\": 729, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 729.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.2867780327796936, \"percentile_inc_nulls\": 0.2867780327796936, \"value_count\": 728, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 728.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.2787693738937378, \"percentile_inc_nulls\": 0.2787693738937378, \"value_count\": 726, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 726.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.2708930969238281, \"percentile_inc_nulls\": 0.2708930969238281, \"value_count\": 714, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 714.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.26316022872924805, \"percentile_inc_nulls\": 0.26316022872924805, \"value_count\": 701, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 701.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.25547146797180176, \"percentile_inc_nulls\": 0.25547146797180176, \"value_count\": 697, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 697.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.24785995483398438, \"percentile_inc_nulls\": 0.24785995483398438, \"value_count\": 690, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 690.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.24048006534576416, \"percentile_inc_nulls\": 0.24048006534576416, \"value_count\": 669, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 669.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.23322153091430664, \"percentile_inc_nulls\": 0.23322153091430664, \"value_count\": 658, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 658.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.22604024410247803, \"percentile_inc_nulls\": 0.22604024410247803, \"value_count\": 651, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 651.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.21891409158706665, \"percentile_inc_nulls\": 0.21891409158706665, \"value_count\": 646, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 646.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.21187621355056763, \"percentile_inc_nulls\": 0.21187621355056763, \"value_count\": 638, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 638.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.20532363653182983, \"percentile_inc_nulls\": 0.20532363653182983, \"value_count\": 594, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 594.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.19880419969558716, \"percentile_inc_nulls\": 0.19880419969558716, \"value_count\": 591, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 591.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.19238406419754028, \"percentile_inc_nulls\": 0.19238406419754028, \"value_count\": 582, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 582.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.1859859824180603, \"percentile_inc_nulls\": 0.1859859824180603, \"value_count\": 580, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 580.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.1797202229499817, \"percentile_inc_nulls\": 0.1797202229499817, \"value_count\": 568, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 568.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.16734325885772705, \"percentile_inc_nulls\": 0.16734325885772705, \"value_count\": 561, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1122.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.1615297794342041, \"percentile_inc_nulls\": 0.1615297794342041, \"value_count\": 527, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 527.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.15580463409423828, \"percentile_inc_nulls\": 0.15580463409423828, \"value_count\": 519, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 519.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.15009045600891113, \"percentile_inc_nulls\": 0.15009045600891113, \"value_count\": 518, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 518.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.14444249868392944, \"percentile_inc_nulls\": 0.14444249868392944, \"value_count\": 512, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 512.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.1388276219367981, \"percentile_inc_nulls\": 0.1388276219367981, \"value_count\": 509, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 509.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.13330096006393433, \"percentile_inc_nulls\": 0.13330096006393433, \"value_count\": 501, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 501.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.12780743837356567, \"percentile_inc_nulls\": 0.12780743837356567, \"value_count\": 498, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 498.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.12236905097961426, \"percentile_inc_nulls\": 0.12236905097961426, \"value_count\": 493, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 493.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.11723953485488892, \"percentile_inc_nulls\": 0.11723953485488892, \"value_count\": 465, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 465.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.1125292181968689, \"percentile_inc_nulls\": 0.1125292181968689, \"value_count\": 427, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 427.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.10786300897598267, \"percentile_inc_nulls\": 0.10786300897598267, \"value_count\": 423, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 423.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.1032189130783081, \"percentile_inc_nulls\": 0.1032189130783081, \"value_count\": 421, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 421.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.09859681129455566, \"percentile_inc_nulls\": 0.09859681129455566, \"value_count\": 419, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 419.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.09408509731292725, \"percentile_inc_nulls\": 0.09408509731292725, \"value_count\": 409, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 409.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.08977186679840088, \"percentile_inc_nulls\": 0.08977186679840088, \"value_count\": 391, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 391.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.08559107780456543, \"percentile_inc_nulls\": 0.08559107780456543, \"value_count\": 379, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 379.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.08148741722106934, \"percentile_inc_nulls\": 0.08148741722106934, \"value_count\": 372, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 372.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.07742798328399658, \"percentile_inc_nulls\": 0.07742798328399658, \"value_count\": 368, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 368.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.07365530729293823, \"percentile_inc_nulls\": 0.07365530729293823, \"value_count\": 342, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 342.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.06990468502044678, \"percentile_inc_nulls\": 0.06990468502044678, \"value_count\": 340, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 340.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.06624233722686768, \"percentile_inc_nulls\": 0.06624233722686768, \"value_count\": 332, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 332.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.06271237134933472, \"percentile_inc_nulls\": 0.06271237134933472, \"value_count\": 320, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 320.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.05934780836105347, \"percentile_inc_nulls\": 0.05934780836105347, \"value_count\": 305, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 305.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.056016385555267334, \"percentile_inc_nulls\": 0.056016385555267334, \"value_count\": 302, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 302.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.049463868141174316, \"percentile_inc_nulls\": 0.049463868141174316, \"value_count\": 297, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 594.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.046452343463897705, \"percentile_inc_nulls\": 0.046452343463897705, \"value_count\": 273, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 273.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.04352909326553345, \"percentile_inc_nulls\": 0.04352909326553345, \"value_count\": 265, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 265.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.04068303108215332, \"percentile_inc_nulls\": 0.04068303108215332, \"value_count\": 258, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 258.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.03554248809814453, \"percentile_inc_nulls\": 0.03554248809814453, \"value_count\": 233, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 466.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.03330320119857788, \"percentile_inc_nulls\": 0.03330320119857788, \"value_count\": 203, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 203.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.031107962131500244, \"percentile_inc_nulls\": 0.031107962131500244, \"value_count\": 199, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 199.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.028945863246917725, \"percentile_inc_nulls\": 0.028945863246917725, \"value_count\": 196, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 196.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.023286879062652588, \"percentile_inc_nulls\": 0.023286879062652588, \"value_count\": 171, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 513.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.9846776723861694, \"percentile_inc_nulls\": 0.9846776723861694, \"value_count\": 1389, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1389.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.9695428609848022, \"percentile_inc_nulls\": 0.9695428609848022, \"value_count\": 1372, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1372.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.9547831416130066, \"percentile_inc_nulls\": 0.9547831416130066, \"value_count\": 1338, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1338.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.9402440190315247, \"percentile_inc_nulls\": 0.9402440190315247, \"value_count\": 1318, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1318.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.9257820844650269, \"percentile_inc_nulls\": 0.9257820844650269, \"value_count\": 1311, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1311.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.9114856719970703, \"percentile_inc_nulls\": 0.9114856719970703, \"value_count\": 1296, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1296.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.8972113132476807, \"percentile_inc_nulls\": 0.8972113132476807, \"value_count\": 1294, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1294.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.8829590082168579, \"percentile_inc_nulls\": 0.8829590082168579, \"value_count\": 1292, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1292.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.8690376281738281, \"percentile_inc_nulls\": 0.8690376281738281, \"value_count\": 1262, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1262.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.8551493883132935, \"percentile_inc_nulls\": 0.8551493883132935, \"value_count\": 1259, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1259.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.8413382768630981, \"percentile_inc_nulls\": 0.8413382768630981, \"value_count\": 1252, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1252.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.827560305595398, \"percentile_inc_nulls\": 0.827560305595398, \"value_count\": 1249, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1249.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.8138927221298218, \"percentile_inc_nulls\": 0.8138927221298218, \"value_count\": 1239, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1239.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.8002581596374512, \"percentile_inc_nulls\": 0.8002581596374512, \"value_count\": 1236, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1236.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.786733865737915, \"percentile_inc_nulls\": 0.786733865737915, \"value_count\": 1226, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1226.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.7732758522033691, \"percentile_inc_nulls\": 0.7732758522033691, \"value_count\": 1220, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1220.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.7598508596420288, \"percentile_inc_nulls\": 0.7598508596420288, \"value_count\": 1217, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1217.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.7468119859695435, \"percentile_inc_nulls\": 0.7468119859695435, \"value_count\": 1182, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1182.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.7338833808898926, \"percentile_inc_nulls\": 0.7338833808898926, \"value_count\": 1172, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1172.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.7209658622741699, \"percentile_inc_nulls\": 0.7209658622741699, \"value_count\": 1171, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1171.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.7081476449966431, \"percentile_inc_nulls\": 0.7081476449966431, \"value_count\": 1162, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1162.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.682908296585083, \"percentile_inc_nulls\": 0.682908296585083, \"value_count\": 1144, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 2288.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.6703988909721375, \"percentile_inc_nulls\": 0.6703988909721375, \"value_count\": 1134, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1134.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.6579005718231201, \"percentile_inc_nulls\": 0.6579005718231201, \"value_count\": 1133, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1133.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.645556628704071, \"percentile_inc_nulls\": 0.645556628704071, \"value_count\": 1119, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1119.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.6336208581924438, \"percentile_inc_nulls\": 0.6336208581924438, \"value_count\": 1082, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1082.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.6218065023422241, \"percentile_inc_nulls\": 0.6218065023422241, \"value_count\": 1071, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1071.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.6105877161026001, \"percentile_inc_nulls\": 0.6105877161026001, \"value_count\": 1017, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1017.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.5994572639465332, \"percentile_inc_nulls\": 0.5994572639465332, \"value_count\": 1009, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1009.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.588392972946167, \"percentile_inc_nulls\": 0.588392972946167, \"value_count\": 1003, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1003.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.5773618221282959, \"percentile_inc_nulls\": 0.5773618221282959, \"value_count\": 1000, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1000.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.5663416385650635, \"percentile_inc_nulls\": 0.5663416385650635, \"value_count\": 999, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 999.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.5445660352706909, \"percentile_inc_nulls\": 0.5445660352706909, \"value_count\": 987, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1974.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.5337003469467163, \"percentile_inc_nulls\": 0.5337003469467163, \"value_count\": 985, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 985.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.5228787064552307, \"percentile_inc_nulls\": 0.5228787064552307, \"value_count\": 981, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 981.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.5120902061462402, \"percentile_inc_nulls\": 0.5120902061462402, \"value_count\": 978, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 978.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.5013458132743835, \"percentile_inc_nulls\": 0.5013458132743835, \"value_count\": 974, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 974.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.4907558560371399, \"percentile_inc_nulls\": 0.4907558560371399, \"value_count\": 960, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 960.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.4801769256591797, \"percentile_inc_nulls\": 0.4801769256591797, \"value_count\": 959, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 959.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.46990692615509033, \"percentile_inc_nulls\": 0.46990692615509033, \"value_count\": 931, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 931.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.45968097448349, \"percentile_inc_nulls\": 0.45968097448349, \"value_count\": 927, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 927.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.44946610927581787, \"percentile_inc_nulls\": 0.44946610927581787, \"value_count\": 926, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 926.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.43939459323883057, \"percentile_inc_nulls\": 0.43939459323883057, \"value_count\": 913, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 913.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.4294665455818176, \"percentile_inc_nulls\": 0.4294665455818176, \"value_count\": 900, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 900.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.41957151889801025, \"percentile_inc_nulls\": 0.41957151889801025, \"value_count\": 897, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 897.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.40973174571990967, \"percentile_inc_nulls\": 0.40973174571990967, \"value_count\": 892, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 892.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.4000021815299988, \"percentile_inc_nulls\": 0.4000021815299988, \"value_count\": 882, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 882.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.3903830051422119, \"percentile_inc_nulls\": 0.3903830051422119, \"value_count\": 872, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 872.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.38102853298187256, \"percentile_inc_nulls\": 0.38102853298187256, \"value_count\": 848, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 848.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.37176233530044556, \"percentile_inc_nulls\": 0.37176233530044556, \"value_count\": 840, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 840.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.36263954639434814, \"percentile_inc_nulls\": 0.36263954639434814, \"value_count\": 827, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 827.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.3537042737007141, \"percentile_inc_nulls\": 0.3537042737007141, \"value_count\": 810, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 810.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.34483522176742554, \"percentile_inc_nulls\": 0.34483522176742554, \"value_count\": 804, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 804.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.3275824189186096, \"percentile_inc_nulls\": 0.3275824189186096, \"value_count\": 782, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1564.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.021466732025146484, \"percentile_inc_nulls\": 0.021466732025146484, \"value_count\": 165, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 165.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.01811319589614868, \"percentile_inc_nulls\": 0.01811319589614868, \"value_count\": 152, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 304.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.016590893268585205, \"percentile_inc_nulls\": 0.016590893268585205, \"value_count\": 138, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 138.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.015322327613830566, \"percentile_inc_nulls\": 0.015322327613830566, \"value_count\": 115, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 115.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.014064788818359375, \"percentile_inc_nulls\": 0.014064788818359375, \"value_count\": 114, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 114.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.012851357460021973, \"percentile_inc_nulls\": 0.012851357460021973, \"value_count\": 110, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 110.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.011671006679534912, \"percentile_inc_nulls\": 0.011671006679534912, \"value_count\": 107, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 107.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.0106009840965271, \"percentile_inc_nulls\": 0.0106009840965271, \"value_count\": 97, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 97.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.009553015232086182, \"percentile_inc_nulls\": 0.009553015232086182, \"value_count\": 95, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 95.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.008615374565124512, \"percentile_inc_nulls\": 0.008615374565124512, \"value_count\": 85, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 85.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.007743895053863525, \"percentile_inc_nulls\": 0.007743895053863525, \"value_count\": 79, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 79.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.0069165825843811035, \"percentile_inc_nulls\": 0.0069165825843811035, \"value_count\": 75, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 75.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.006243646144866943, \"percentile_inc_nulls\": 0.006243646144866943, \"value_count\": 61, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 61.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.0056038498878479, \"percentile_inc_nulls\": 0.0056038498878479, \"value_count\": 58, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 58.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.004986107349395752, \"percentile_inc_nulls\": 0.004986107349395752, \"value_count\": 56, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 56.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.004467606544494629, \"percentile_inc_nulls\": 0.004467606544494629, \"value_count\": 47, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 47.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.0039601922035217285, \"percentile_inc_nulls\": 0.0039601922035217285, \"value_count\": 46, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 46.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.003529965877532959, \"percentile_inc_nulls\": 0.003529965877532959, \"value_count\": 39, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 39.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.0031218528747558594, \"percentile_inc_nulls\": 0.0031218528747558594, \"value_count\": 37, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 37.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.0027247071266174316, \"percentile_inc_nulls\": 0.0027247071266174316, \"value_count\": 36, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 36.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.0023496150970458984, \"percentile_inc_nulls\": 0.0023496150970458984, \"value_count\": 34, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 34.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.001985609531402588, \"percentile_inc_nulls\": 0.001985609531402588, \"value_count\": 33, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 33.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.0016767382621765137, \"percentile_inc_nulls\": 0.0016767382621765137, \"value_count\": 28, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 28.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.0013788938522338867, \"percentile_inc_nulls\": 0.0013788938522338867, \"value_count\": 27, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 27.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.0011031031608581543, \"percentile_inc_nulls\": 0.0011031031608581543, \"value_count\": 25, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 25.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.0009045600891113281, \"percentile_inc_nulls\": 0.0009045600891113281, \"value_count\": 18, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 18.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.0005515813827514648, \"percentile_inc_nulls\": 0.0005515813827514648, \"value_count\": 16, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 32.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.00038611888885498047, \"percentile_inc_nulls\": 0.00038611888885498047, \"value_count\": 15, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 15.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.00024271011352539062, \"percentile_inc_nulls\": 0.00024271011352539062, \"value_count\": 13, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 13.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.00011032819747924805, \"percentile_inc_nulls\": 0.00011032819747924805, \"value_count\": 12, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 12.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 5.513429641723633e-05, \"percentile_inc_nulls\": 5.513429641723633e-05, \"value_count\": 5, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 5.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 2.205371856689453e-05, \"percentile_inc_nulls\": 2.205371856689453e-05, \"value_count\": 3, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 3.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 0.0, \"percentile_inc_nulls\": 0.0, \"value_count\": 2, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 2.0, \"distinct_value_count\": 151}, {\"percentile_ex_nulls\": 1.0, \"percentile_inc_nulls\": 1.0, \"value_count\": 758, \"group_name\": \"transaction_date\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 758.0, \"distinct_value_count\": 151}]}, \"encoding\": {\"tooltip\": [{\"field\": \"value_count\", \"type\": \"quantitative\"}, {\"field\": \"percentile_ex_nulls\", \"type\": \"quantitative\"}, {\"field\": \"percentile_inc_nulls\", \"type\": \"quantitative\"}, {\"field\": \"total_non_null_rows\", \"type\": \"quantitative\"}, {\"field\": \"total_rows_inc_nulls\", \"type\": \"quantitative\"}], \"x\": {\"field\": \"percentile_ex_nulls\", \"sort\": \"descending\", \"title\": \"Percentile\", \"type\": \"quantitative\"}, \"y\": {\"field\": \"value_count\", \"title\": \"Count of values\", \"type\": \"quantitative\"}}, \"title\": {\"text\": \"Distribution of counts of values in column transaction_date\", \"subtitle\": \"In this col, 0 values (0.0%) are null and there are 151 distinct values\"}}, {\"mark\": \"bar\", \"data\": {\"values\": [{\"value_count\": 1389, \"group_name\": \"transaction_date\", \"value\": \"2022-05-07\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 151}, {\"value_count\": 1372, \"group_name\": \"transaction_date\", \"value\": \"2022-05-09\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 151}, {\"value_count\": 1338, \"group_name\": \"transaction_date\", \"value\": \"2022-05-10\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 151}, {\"value_count\": 1318, \"group_name\": \"transaction_date\", \"value\": \"2022-05-04\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 151}, {\"value_count\": 1311, \"group_name\": \"transaction_date\", \"value\": \"2022-05-05\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 151}, {\"value_count\": 1296, \"group_name\": \"transaction_date\", \"value\": \"2022-05-06\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 151}, {\"value_count\": 1294, \"group_name\": \"transaction_date\", \"value\": \"2022-05-08\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 151}, {\"value_count\": 1292, \"group_name\": \"transaction_date\", \"value\": \"2022-05-01\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 151}, {\"value_count\": 1262, \"group_name\": \"transaction_date\", \"value\": \"2022-05-03\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 151}, {\"value_count\": 1259, \"group_name\": \"transaction_date\", \"value\": \"2022-04-30\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 151}]}, \"encoding\": {\"tooltip\": [{\"field\": \"value\", \"type\": \"nominal\"}, {\"field\": \"value_count\", \"type\": \"quantitative\"}, {\"field\": \"total_non_null_rows\", \"type\": \"quantitative\"}, {\"field\": \"total_rows_inc_nulls\", \"type\": \"quantitative\"}], \"x\": {\"field\": \"value\", \"sort\": \"-y\", \"title\": null, \"type\": \"nominal\"}, \"y\": {\"field\": \"value_count\", \"title\": \"Value count\", \"type\": \"quantitative\"}}, \"title\": \"Top 10 values by value count\"}, {\"mark\": \"bar\", \"data\": {\"values\": [{\"value_count\": 2, \"group_name\": \"transaction_date\", \"value\": \"2022-01-01\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 151}, {\"value_count\": 3, \"group_name\": \"transaction_date\", \"value\": \"2022-05-30\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 151}, {\"value_count\": 5, \"group_name\": \"transaction_date\", \"value\": \"2022-05-31\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 151}, {\"value_count\": 12, \"group_name\": \"transaction_date\", \"value\": \"2022-05-29\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 151}, {\"value_count\": 13, \"group_name\": \"transaction_date\", \"value\": \"2022-01-02\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 151}, {\"value_count\": 15, \"group_name\": \"transaction_date\", \"value\": \"2022-05-28\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 151}, {\"value_count\": 16, \"group_name\": \"transaction_date\", \"value\": \"2022-05-27\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 151}, {\"value_count\": 16, \"group_name\": \"transaction_date\", \"value\": \"2022-05-26\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 151}, {\"value_count\": 18, \"group_name\": \"transaction_date\", \"value\": \"2022-01-03\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 151}, {\"value_count\": 25, \"group_name\": \"transaction_date\", \"value\": \"2022-05-25\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 151}]}, \"encoding\": {\"tooltip\": [{\"field\": \"value\", \"type\": \"nominal\"}, {\"field\": \"value_count\", \"type\": \"quantitative\"}, {\"field\": \"total_non_null_rows\", \"type\": \"quantitative\"}, {\"field\": \"total_rows_inc_nulls\", \"type\": \"quantitative\"}], \"x\": {\"field\": \"value\", \"sort\": \"-y\", \"title\": null, \"type\": \"nominal\"}, \"y\": {\"field\": \"value_count\", \"scale\": {\"domain\": [0, 1389]}, \"title\": \"Value count\", \"type\": \"quantitative\"}}, \"title\": \"Bottom 10 values by value count\"}]}, {\"hconcat\": [{\"mark\": {\"type\": \"line\", \"interpolate\": \"step-after\"}, \"data\": {\"values\": [{\"percentile_ex_nulls\": 0.999845564365387, \"percentile_inc_nulls\": 0.999845564365387, \"value_count\": 14, \"group_name\": \"amount\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 14.0, \"distinct_value_count\": 55935}, {\"percentile_ex_nulls\": 0.9989961385726929, \"percentile_inc_nulls\": 0.9989961385726929, \"value_count\": 11, \"group_name\": \"amount\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 77.0, \"distinct_value_count\": 55935}, {\"percentile_ex_nulls\": 0.997672438621521, \"percentile_inc_nulls\": 0.997672438621521, \"value_count\": 10, \"group_name\": \"amount\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 120.0, \"distinct_value_count\": 55935}, {\"percentile_ex_nulls\": 0.99360191822052, \"percentile_inc_nulls\": 0.99360191822052, \"value_count\": 9, \"group_name\": \"amount\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 369.0, \"distinct_value_count\": 55935}, {\"percentile_ex_nulls\": 0.9860124588012695, \"percentile_inc_nulls\": 0.9860124588012695, \"value_count\": 8, \"group_name\": \"amount\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 688.0, \"distinct_value_count\": 55935}, {\"percentile_ex_nulls\": 0.9686383008956909, \"percentile_inc_nulls\": 0.9686383008956909, \"value_count\": 7, \"group_name\": \"amount\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 1575.0, \"distinct_value_count\": 55935}, {\"percentile_ex_nulls\": 0.9321691989898682, \"percentile_inc_nulls\": 0.9321691989898682, \"value_count\": 6, \"group_name\": \"amount\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 3306.0, \"distinct_value_count\": 55935}, {\"percentile_ex_nulls\": 0.8737589716911316, \"percentile_inc_nulls\": 0.8737589716911316, \"value_count\": 5, \"group_name\": \"amount\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 5295.0, \"distinct_value_count\": 55935}, {\"percentile_ex_nulls\": 0.7760226130485535, \"percentile_inc_nulls\": 0.7760226130485535, \"value_count\": 4, \"group_name\": \"amount\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 8860.0, \"distinct_value_count\": 55935}, {\"percentile_ex_nulls\": 0.634679913520813, \"percentile_inc_nulls\": 0.634679913520813, \"value_count\": 3, \"group_name\": \"amount\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 12813.0, \"distinct_value_count\": 55935}, {\"percentile_ex_nulls\": 0.4125556945800781, \"percentile_inc_nulls\": 0.4125556945800781, \"value_count\": 2, \"group_name\": \"amount\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 20136.0, \"distinct_value_count\": 55935}, {\"percentile_ex_nulls\": 0.0, \"percentile_inc_nulls\": 0.0, \"value_count\": 1, \"group_name\": \"amount\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 37399.0, \"distinct_value_count\": 55935}, {\"percentile_ex_nulls\": 1.0, \"percentile_inc_nulls\": 1.0, \"value_count\": 14, \"group_name\": \"amount\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"sum_tokens_in_value_count_group\": 14.0, \"distinct_value_count\": 55935}]}, \"encoding\": {\"tooltip\": [{\"field\": \"value_count\", \"type\": \"quantitative\"}, {\"field\": \"percentile_ex_nulls\", \"type\": \"quantitative\"}, {\"field\": \"percentile_inc_nulls\", \"type\": \"quantitative\"}, {\"field\": \"total_non_null_rows\", \"type\": \"quantitative\"}, {\"field\": \"total_rows_inc_nulls\", \"type\": \"quantitative\"}], \"x\": {\"field\": \"percentile_ex_nulls\", \"sort\": \"descending\", \"title\": \"Percentile\", \"type\": \"quantitative\"}, \"y\": {\"field\": \"value_count\", \"title\": \"Count of values\", \"type\": \"quantitative\"}}, \"title\": {\"text\": \"Distribution of counts of values in column amount\", \"subtitle\": \"In this col, 0 values (0.0%) are null and there are 55935 distinct values\"}}, {\"mark\": \"bar\", \"data\": {\"values\": [{\"value_count\": 14, \"group_name\": \"amount\", \"value\": \"80.68\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 55935}, {\"value_count\": 11, \"group_name\": \"amount\", \"value\": \"72.91\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 55935}, {\"value_count\": 11, \"group_name\": \"amount\", \"value\": \"116.14\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 55935}, {\"value_count\": 11, \"group_name\": \"amount\", \"value\": \"36.3\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 55935}, {\"value_count\": 11, \"group_name\": \"amount\", \"value\": \"88.72\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 55935}, {\"value_count\": 11, \"group_name\": \"amount\", \"value\": \"48.92\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 55935}, {\"value_count\": 11, \"group_name\": \"amount\", \"value\": \"74.87\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 55935}, {\"value_count\": 11, \"group_name\": \"amount\", \"value\": \"52.15\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 55935}, {\"value_count\": 10, \"group_name\": \"amount\", \"value\": \"157.45\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 55935}, {\"value_count\": 10, \"group_name\": \"amount\", \"value\": \"99.35\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 55935}]}, \"encoding\": {\"tooltip\": [{\"field\": \"value\", \"type\": \"nominal\"}, {\"field\": \"value_count\", \"type\": \"quantitative\"}, {\"field\": \"total_non_null_rows\", \"type\": \"quantitative\"}, {\"field\": \"total_rows_inc_nulls\", \"type\": \"quantitative\"}], \"x\": {\"field\": \"value\", \"sort\": \"-y\", \"title\": null, \"type\": \"nominal\"}, \"y\": {\"field\": \"value_count\", \"title\": \"Value count\", \"type\": \"quantitative\"}}, \"title\": \"Top 10 values by value count\"}, {\"mark\": \"bar\", \"data\": {\"values\": [{\"value_count\": 1, \"group_name\": \"amount\", \"value\": \"26245.91\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 55935}, {\"value_count\": 1, \"group_name\": \"amount\", \"value\": \"5961.04\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 55935}, {\"value_count\": 1, \"group_name\": \"amount\", \"value\": \"1177.03\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 55935}, {\"value_count\": 1, \"group_name\": \"amount\", \"value\": \"44053.98\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 55935}, {\"value_count\": 1, \"group_name\": \"amount\", \"value\": \"391.19\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 55935}, {\"value_count\": 1, \"group_name\": \"amount\", \"value\": \"55769.58\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 55935}, {\"value_count\": 1, \"group_name\": \"amount\", \"value\": \"10742.96\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 55935}, {\"value_count\": 1, \"group_name\": \"amount\", \"value\": \"16517.33\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 55935}, {\"value_count\": 1, \"group_name\": \"amount\", \"value\": \"11646.21\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 55935}, {\"value_count\": 1, \"group_name\": \"amount\", \"value\": \"16.78\", \"total_non_null_rows\": 90652, \"total_rows_inc_nulls\": 90652, \"distinct_value_count\": 55935}]}, \"encoding\": {\"tooltip\": [{\"field\": \"value\", \"type\": \"nominal\"}, {\"field\": \"value_count\", \"type\": \"quantitative\"}, {\"field\": \"total_non_null_rows\", \"type\": \"quantitative\"}, {\"field\": \"total_rows_inc_nulls\", \"type\": \"quantitative\"}], \"x\": {\"field\": \"value\", \"sort\": \"-y\", \"title\": null, \"type\": \"nominal\"}, \"y\": {\"field\": \"value_count\", \"scale\": {\"domain\": [0, 14]}, \"title\": \"Value count\", \"type\": \"quantitative\"}}, \"title\": \"Bottom 10 values by value count\"}]}], \"$schema\": \"https://vega.github.io/schema/vega-lite/v5.9.3.json\"}, {\"mode\": \"vega-lite\"});\n",
              "</script>"
            ],
            "text/plain": [
              "alt.VConcatChart(...)"
            ]
          },
          "execution_count": 3,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "from splink.exploratory import profile_columns\n",
        "\n",
        "db_api = DuckDBAPI()\n",
        "profile_columns(\n",
        "    [df_origin, df_destination],\n",
        "    db_api=db_api,\n",
        "    column_expressions=[\n",
        "        \"memo\",\n",
        "        \"transaction_date\",\n",
        "        \"amount\",\n",
        "    ],\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-06-07T09:22:32.724189Z",
          "iopub.status.busy": "2024-06-07T09:22:32.723901Z",
          "iopub.status.idle": "2024-06-07T09:22:33.500975Z",
          "shell.execute_reply": "2024-06-07T09:22:33.500399Z"
        }
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "\n",
              "<style>\n",
              "  #altair-viz-5492083b18d2426286f96c164d89c09c.vega-embed {\n",
              "    width: 100%;\n",
              "    display: flex;\n",
              "  }\n",
              "\n",
              "  #altair-viz-5492083b18d2426286f96c164d89c09c.vega-embed details,\n",
              "  #altair-viz-5492083b18d2426286f96c164d89c09c.vega-embed details summary {\n",
              "    position: relative;\n",
              "  }\n",
              "</style>\n",
              "<div id=\"altair-viz-5492083b18d2426286f96c164d89c09c\"></div>\n",
              "<script type=\"text/javascript\">\n",
              "  var VEGA_DEBUG = (typeof VEGA_DEBUG == \"undefined\") ? {} : VEGA_DEBUG;\n",
              "  (function(spec, embedOpt){\n",
              "    let outputDiv = document.currentScript.previousElementSibling;\n",
              "    if (outputDiv.id !== \"altair-viz-5492083b18d2426286f96c164d89c09c\") {\n",
              "      outputDiv = document.getElementById(\"altair-viz-5492083b18d2426286f96c164d89c09c\");\n",
              "    }\n",
              "    const paths = {\n",
              "      \"vega\": \"https://cdn.jsdelivr.net/npm/vega@5?noext\",\n",
              "      \"vega-lib\": \"https://cdn.jsdelivr.net/npm/vega-lib?noext\",\n",
              "      \"vega-lite\": \"https://cdn.jsdelivr.net/npm/vega-lite@5.17.0?noext\",\n",
              "      \"vega-embed\": \"https://cdn.jsdelivr.net/npm/vega-embed@6?noext\",\n",
              "    };\n",
              "\n",
              "    function maybeLoadScript(lib, version) {\n",
              "      var key = `${lib.replace(\"-\", \"\")}_version`;\n",
              "      return (VEGA_DEBUG[key] == version) ?\n",
              "        Promise.resolve(paths[lib]) :\n",
              "        new Promise(function(resolve, reject) {\n",
              "          var s = document.createElement('script');\n",
              "          document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
              "          s.async = true;\n",
              "          s.onload = () => {\n",
              "            VEGA_DEBUG[key] = version;\n",
              "            return resolve(paths[lib]);\n",
              "          };\n",
              "          s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
              "          s.src = paths[lib];\n",
              "        });\n",
              "    }\n",
              "\n",
              "    function showError(err) {\n",
              "      outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
              "      throw err;\n",
              "    }\n",
              "\n",
              "    function displayChart(vegaEmbed) {\n",
              "      vegaEmbed(outputDiv, spec, embedOpt)\n",
              "        .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
              "    }\n",
              "\n",
              "    if(typeof define === \"function\" && define.amd) {\n",
              "      requirejs.config({paths});\n",
              "      require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
              "    } else {\n",
              "      maybeLoadScript(\"vega\", \"5\")\n",
              "        .then(() => maybeLoadScript(\"vega-lite\", \"5.17.0\"))\n",
              "        .then(() => maybeLoadScript(\"vega-embed\", \"6\"))\n",
              "        .catch(showError)\n",
              "        .then(() => displayChart(vegaEmbed));\n",
              "    }\n",
              "  })({\"config\": {\"view\": {\"continuousWidth\": 300, \"continuousHeight\": 300}}, \"data\": {\"name\": \"data-843f466f5c4fe33e35d54e49194ad7f3\"}, \"mark\": \"bar\", \"encoding\": {\"order\": {\"field\": \"cumulative_rows\"}, \"tooltip\": [{\"field\": \"blocking_rule\", \"title\": \"SQL Condition\", \"type\": \"nominal\"}, {\"field\": \"row_count\", \"format\": \",\", \"title\": \"Comparisons Generated\", \"type\": \"quantitative\"}, {\"field\": \"cumulative_rows\", \"format\": \",\", \"title\": \"Cumulative Comparisons\", \"type\": \"quantitative\"}, {\"field\": \"cartesian\", \"format\": \",\", \"title\": \"Total comparisons in Cartesian product\", \"type\": \"quantitative\"}], \"x\": {\"field\": \"start\", \"title\": \"Comparisons Generated by Rule(s)\", \"type\": \"quantitative\"}, \"x2\": {\"field\": \"cumulative_rows\"}, \"y\": {\"field\": \"blocking_rule\", \"sort\": [\"-x2\"], \"title\": \"SQL Blocking Rule\"}}, \"height\": {\"step\": 20}, \"title\": {\"text\": \"Count of Additional Comparisons Generated by Each Blocking Rule\", \"subtitle\": \"(Counts exclude comparisons already generated by previous rules)\"}, \"width\": 450, \"$schema\": \"https://vega.github.io/schema/vega-lite/v5.9.3.json\", \"datasets\": {\"data-843f466f5c4fe33e35d54e49194ad7f3\": [{\"blocking_rule\": \"\\n    strftime(l.transaction_date, '%Y%m') = strftime(r.transaction_date, '%Y%m')\\n    and substr(l.memo, 1,3) = substr(r.memo,1,3)\\n    and l.amount/r.amount > 0.7   and l.amount/r.amount < 1.3\\n\", \"row_count\": 301537, \"cumulative_rows\": 301537, \"cartesian\": 2054446276, \"match_key\": \"0\", \"start\": 0}, {\"blocking_rule\": \"\\n    strftime(l.transaction_date+15, '%Y%m') = strftime(r.transaction_date, '%Y%m')\\n    and substr(l.memo, 1,3) = substr(r.memo,1,3)\\n    and l.amount/r.amount > 0.7   and l.amount/r.amount < 1.3\\n\", \"row_count\": 111878, \"cumulative_rows\": 413415, \"cartesian\": 2054446276, \"match_key\": \"1\", \"start\": 301537}, {\"blocking_rule\": \"SUBSTR(l.memo, 1, 9) = SUBSTR(r.memo, 1, 9)\", \"row_count\": 285581, \"cumulative_rows\": 698996, \"cartesian\": 2054446276, \"match_key\": \"2\", \"start\": 413415}, {\"blocking_rule\": \"\\nround(l.amount/2,0)*2 = round(r.amount/2,0)*2 and yearweek(r.transaction_date) = yearweek(l.transaction_date)\\n\", \"row_count\": 341312, \"cumulative_rows\": 1040308, \"cartesian\": 2054446276, \"match_key\": \"3\", \"start\": 698996}, {\"blocking_rule\": \"\\nround(l.amount/2,0)*2 = round((r.amount+1)/2,0)*2 and yearweek(r.transaction_date) = yearweek(l.transaction_date + 4)\\n\", \"row_count\": 260358, \"cumulative_rows\": 1300666, \"cartesian\": 2054446276, \"match_key\": \"4\", \"start\": 1040308}, {\"blocking_rule\": \"l.\\\"unique_id\\\" = r.\\\"unique_id\\\"\", \"row_count\": 665, \"cumulative_rows\": 1301331, \"cartesian\": 2054446276, \"match_key\": \"5\", \"start\": 1300666}]}}, {\"mode\": \"vega-lite\"});\n",
              "</script>"
            ],
            "text/plain": [
              "alt.Chart(...)"
            ]
          },
          "execution_count": 4,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "from splink import DuckDBAPI, block_on\n",
        "from splink.blocking_analysis import (\n",
        "    cumulative_comparisons_to_be_scored_from_blocking_rules_chart,\n",
        ")\n",
        "\n",
        "# Design blocking rules that allow for differences in transaction date and amounts\n",
        "blocking_rule_date_1 = \"\"\"\n",
        "    strftime(l.transaction_date, '%Y%m') = strftime(r.transaction_date, '%Y%m')\n",
        "    and substr(l.memo, 1,3) = substr(r.memo,1,3)\n",
        "    and l.amount/r.amount > 0.7   and l.amount/r.amount < 1.3\n",
        "\"\"\"\n",
        "\n",
        "# Offset by half a month to ensure we capture case when the dates are e.g. 31st Jan and 1st Feb\n",
        "blocking_rule_date_2 = \"\"\"\n",
        "    strftime(l.transaction_date+15, '%Y%m') = strftime(r.transaction_date, '%Y%m')\n",
        "    and substr(l.memo, 1,3) = substr(r.memo,1,3)\n",
        "    and l.amount/r.amount > 0.7   and l.amount/r.amount < 1.3\n",
        "\"\"\"\n",
        "\n",
        "blocking_rule_memo = block_on(\"substr(memo,1,9)\")\n",
        "\n",
        "blocking_rule_amount_1 = \"\"\"\n",
        "round(l.amount/2,0)*2 = round(r.amount/2,0)*2 and yearweek(r.transaction_date) = yearweek(l.transaction_date)\n",
        "\"\"\"\n",
        "\n",
        "blocking_rule_amount_2 = \"\"\"\n",
        "round(l.amount/2,0)*2 = round((r.amount+1)/2,0)*2 and yearweek(r.transaction_date) = yearweek(l.transaction_date + 4)\n",
        "\"\"\"\n",
        "\n",
        "blocking_rule_cheat = block_on(\"unique_id\")\n",
        "\n",
        "\n",
        "brs = [\n",
        "    blocking_rule_date_1,\n",
        "    blocking_rule_date_2,\n",
        "    blocking_rule_memo,\n",
        "    blocking_rule_amount_1,\n",
        "    blocking_rule_amount_2,\n",
        "    blocking_rule_cheat,\n",
        "]\n",
        "\n",
        "\n",
        "db_api = DuckDBAPI()\n",
        "\n",
        "cumulative_comparisons_to_be_scored_from_blocking_rules_chart(\n",
        "    table_or_tables=[df_origin, df_destination],\n",
        "    blocking_rules=brs,\n",
        "    db_api=db_api,\n",
        "    link_type=\"link_only\"\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-06-07T09:22:33.504001Z",
          "iopub.status.busy": "2024-06-07T09:22:33.503779Z",
          "iopub.status.idle": "2024-06-07T09:22:33.511675Z",
          "shell.execute_reply": "2024-06-07T09:22:33.511212Z"
        }
      },
      "outputs": [],
      "source": [
        "# Full settings for linking model\n",
        "import splink.comparison_level_library as cll\n",
        "import splink.comparison_library as cl\n",
        "\n",
        "comparison_amount = {\n",
        "    \"output_column_name\": \"amount\",\n",
        "    \"comparison_levels\": [\n",
        "        cll.NullLevel(\"amount\"),\n",
        "        cll.ExactMatchLevel(\"amount\"),\n",
        "        cll.PercentageDifferenceLevel(\"amount\", 0.01),\n",
        "        cll.PercentageDifferenceLevel(\"amount\", 0.03),\n",
        "        cll.PercentageDifferenceLevel(\"amount\", 0.1),\n",
        "        cll.PercentageDifferenceLevel(\"amount\", 0.3),\n",
        "        cll.ElseLevel(),\n",
        "    ],\n",
        "    \"comparison_description\": \"Amount percentage difference\",\n",
        "}\n",
        "\n",
        "# The date distance is one sided becaause transactions should only arrive after they've left\n",
        "# As a result, the comparison_template_library date difference functions are not appropriate\n",
        "within_n_days_template = \"transaction_date_r - transaction_date_l <= {n} and transaction_date_r >= transaction_date_l\"\n",
        "\n",
        "comparison_date = {\n",
        "    \"output_column_name\": \"transaction_date\",\n",
        "    \"comparison_levels\": [\n",
        "        cll.NullLevel(\"transaction_date\"),\n",
        "        {\n",
        "            \"sql_condition\": within_n_days_template.format(n=1),\n",
        "            \"label_for_charts\": \"1 day\",\n",
        "        },\n",
        "        {\n",
        "            \"sql_condition\": within_n_days_template.format(n=4),\n",
        "            \"label_for_charts\": \"<=4 days\",\n",
        "        },\n",
        "        {\n",
        "            \"sql_condition\": within_n_days_template.format(n=10),\n",
        "            \"label_for_charts\": \"<=10 days\",\n",
        "        },\n",
        "        {\n",
        "            \"sql_condition\": within_n_days_template.format(n=30),\n",
        "            \"label_for_charts\": \"<=30 days\",\n",
        "        },\n",
        "        cll.ElseLevel(),\n",
        "    ],\n",
        "    \"comparison_description\": \"Transaction date days apart\",\n",
        "}\n",
        "\n",
        "\n",
        "settings = SettingsCreator(\n",
        "    link_type=\"link_only\",\n",
        "    probability_two_random_records_match=1 / len(df_origin),\n",
        "    blocking_rules_to_generate_predictions=[\n",
        "        blocking_rule_date_1,\n",
        "        blocking_rule_date_2,\n",
        "        blocking_rule_memo,\n",
        "        blocking_rule_amount_1,\n",
        "        blocking_rule_amount_2,\n",
        "        blocking_rule_cheat,\n",
        "    ],\n",
        "    comparisons=[\n",
        "        comparison_amount,\n",
        "        cl.LevenshteinAtThresholds(\"memo\", [2, 6, 10]),\n",
        "        comparison_date,\n",
        "    ],\n",
        "    retain_intermediate_calculation_columns=True,\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-06-07T09:22:33.514381Z",
          "iopub.status.busy": "2024-06-07T09:22:33.514150Z",
          "iopub.status.idle": "2024-06-07T09:22:33.621746Z",
          "shell.execute_reply": "2024-06-07T09:22:33.621038Z"
        }
      },
      "outputs": [],
      "source": [
        "linker = Linker(\n",
        "    [df_origin, df_destination],\n",
        "    settings,\n",
        "    input_table_aliases=[\"__ori\", \"_dest\"],\n",
        "    db_api=db_api,\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-06-07T09:22:33.625044Z",
          "iopub.status.busy": "2024-06-07T09:22:33.624807Z",
          "iopub.status.idle": "2024-06-07T09:22:35.145751Z",
          "shell.execute_reply": "2024-06-07T09:22:35.145280Z"
        }
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "You are using the default value for `max_pairs`, which may be too small and thus lead to inaccurate estimates for your model's u-parameters. Consider increasing to 1e8 or 1e9, which will result in more accurate estimates, but with a longer run time.\n",
            "----- Estimating u probabilities using random sampling -----\n",
            "\n",
            "Estimated u probabilities using random sampling\n",
            "\n",
            "Your model is not yet fully trained. Missing estimates for:\n",
            "    - amount (no m values are trained).\n",
            "    - memo (no m values are trained).\n",
            "    - transaction_date (no m values are trained).\n"
          ]
        }
      ],
      "source": [
        "linker.training.estimate_u_using_random_sampling(max_pairs=1e6)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-06-07T09:22:35.148614Z",
          "iopub.status.busy": "2024-06-07T09:22:35.148331Z",
          "iopub.status.idle": "2024-06-07T09:22:36.323460Z",
          "shell.execute_reply": "2024-06-07T09:22:36.322736Z"
        }
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "\n",
            "----- Starting EM training session -----\n",
            "\n",
            "Estimating the m probabilities of the model by blocking on:\n",
            "l.\"memo\" = r.\"memo\"\n",
            "\n",
            "Parameter estimates will be made for the following comparison(s):\n",
            "    - amount\n",
            "    - transaction_date\n",
            "\n",
            "Parameter estimates cannot be made for the following comparison(s) since they are used in the blocking rules: \n",
            "    - memo\n",
            "\n",
            "Iteration 1: Largest change in params was -0.588 in the m_probability of amount, level `Exact match on amount`\n",
            "Iteration 2: Largest change in params was -0.176 in the m_probability of transaction_date, level `1 day`\n",
            "Iteration 3: Largest change in params was 0.00996 in the m_probability of amount, level `Percentage difference of 'amount' within 10.00%`\n",
            "Iteration 4: Largest change in params was 0.0022 in the m_probability of transaction_date, level `<=30 days`\n",
            "Iteration 5: Largest change in params was 0.000385 in the m_probability of transaction_date, level `<=30 days`\n",
            "Iteration 6: Largest change in params was -0.000255 in the m_probability of amount, level `All other comparisons`\n",
            "Iteration 7: Largest change in params was -0.000229 in the m_probability of amount, level `All other comparisons`\n",
            "Iteration 8: Largest change in params was -0.000208 in the m_probability of amount, level `All other comparisons`\n",
            "Iteration 9: Largest change in params was -0.00019 in the m_probability of amount, level `All other comparisons`\n",
            "Iteration 10: Largest change in params was -0.000173 in the m_probability of amount, level `All other comparisons`\n",
            "Iteration 11: Largest change in params was -0.000159 in the m_probability of amount, level `All other comparisons`\n",
            "Iteration 12: Largest change in params was -0.000146 in the m_probability of amount, level `All other comparisons`\n",
            "Iteration 13: Largest change in params was -0.000135 in the m_probability of amount, level `All other comparisons`\n",
            "Iteration 14: Largest change in params was -0.000124 in the m_probability of amount, level `All other comparisons`\n",
            "Iteration 15: Largest change in params was -0.000115 in the m_probability of amount, level `All other comparisons`\n",
            "Iteration 16: Largest change in params was -0.000107 in the m_probability of amount, level `All other comparisons`\n",
            "Iteration 17: Largest change in params was -9.92e-05 in the m_probability of amount, level `All other comparisons`\n",
            "\n",
            "EM converged after 17 iterations\n",
            "\n",
            "Your model is not yet fully trained. Missing estimates for:\n",
            "    - memo (no m values are trained).\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "<EMTrainingSession, blocking on l.\"memo\" = r.\"memo\", deactivating comparisons memo>"
            ]
          },
          "execution_count": 8,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "linker.training.estimate_parameters_using_expectation_maximisation(block_on(\"memo\"))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-06-07T09:22:36.326561Z",
          "iopub.status.busy": "2024-06-07T09:22:36.326344Z",
          "iopub.status.idle": "2024-06-07T09:22:37.563023Z",
          "shell.execute_reply": "2024-06-07T09:22:37.562461Z"
        }
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "\n",
            "----- Starting EM training session -----\n",
            "\n",
            "Estimating the m probabilities of the model by blocking on:\n",
            "l.\"amount\" = r.\"amount\"\n",
            "\n",
            "Parameter estimates will be made for the following comparison(s):\n",
            "    - memo\n",
            "    - transaction_date\n",
            "\n",
            "Parameter estimates cannot be made for the following comparison(s) since they are used in the blocking rules: \n",
            "    - amount\n",
            "\n",
            "Iteration 1: Largest change in params was -0.373 in the m_probability of memo, level `Exact match on memo`\n",
            "Iteration 2: Largest change in params was -0.108 in the m_probability of memo, level `Exact match on memo`\n",
            "Iteration 3: Largest change in params was 0.0202 in the m_probability of memo, level `Levenshtein distance of memo <= 10`\n",
            "Iteration 4: Largest change in params was -0.00538 in the m_probability of memo, level `Exact match on memo`\n",
            "Iteration 5: Largest change in params was 0.00482 in the m_probability of memo, level `All other comparisons`\n",
            "Iteration 6: Largest change in params was 0.00508 in the m_probability of memo, level `All other comparisons`\n",
            "Iteration 7: Largest change in params was 0.00502 in the m_probability of memo, level `All other comparisons`\n",
            "Iteration 8: Largest change in params was 0.00466 in the m_probability of memo, level `All other comparisons`\n",
            "Iteration 9: Largest change in params was 0.00409 in the m_probability of memo, level `All other comparisons`\n",
            "Iteration 10: Largest change in params was 0.00343 in the m_probability of memo, level `All other comparisons`\n",
            "Iteration 11: Largest change in params was 0.00276 in the m_probability of memo, level `All other comparisons`\n",
            "Iteration 12: Largest change in params was 0.00216 in the m_probability of memo, level `All other comparisons`\n",
            "Iteration 13: Largest change in params was 0.00165 in the m_probability of memo, level `All other comparisons`\n",
            "Iteration 14: Largest change in params was 0.00124 in the m_probability of memo, level `All other comparisons`\n",
            "Iteration 15: Largest change in params was 0.000915 in the m_probability of memo, level `All other comparisons`\n",
            "Iteration 16: Largest change in params was 0.000671 in the m_probability of memo, level `All other comparisons`\n",
            "Iteration 17: Largest change in params was 0.000488 in the m_probability of memo, level `All other comparisons`\n",
            "Iteration 18: Largest change in params was 0.000353 in the m_probability of memo, level `All other comparisons`\n",
            "Iteration 19: Largest change in params was 0.000255 in the m_probability of memo, level `All other comparisons`\n",
            "Iteration 20: Largest change in params was 0.000183 in the m_probability of memo, level `All other comparisons`\n",
            "Iteration 21: Largest change in params was 0.000132 in the m_probability of memo, level `All other comparisons`\n",
            "Iteration 22: Largest change in params was 9.45e-05 in the m_probability of memo, level `All other comparisons`\n",
            "\n",
            "EM converged after 22 iterations\n",
            "\n",
            "Your model is fully trained. All comparisons have at least one estimate for their m and u values\n"
          ]
        }
      ],
      "source": [
        "session = linker.training.estimate_parameters_using_expectation_maximisation(block_on(\"amount\"))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-06-07T09:22:37.565956Z",
          "iopub.status.busy": "2024-06-07T09:22:37.565738Z",
          "iopub.status.idle": "2024-06-07T09:22:37.832159Z",
          "shell.execute_reply": "2024-06-07T09:22:37.831506Z"
        }
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "\n",
              "<style>\n",
              "  #altair-viz-83ef44ec9d9048d880186cd914719e2a.vega-embed {\n",
              "    width: 100%;\n",
              "    display: flex;\n",
              "  }\n",
              "\n",
              "  #altair-viz-83ef44ec9d9048d880186cd914719e2a.vega-embed details,\n",
              "  #altair-viz-83ef44ec9d9048d880186cd914719e2a.vega-embed details summary {\n",
              "    position: relative;\n",
              "  }\n",
              "</style>\n",
              "<div id=\"altair-viz-83ef44ec9d9048d880186cd914719e2a\"></div>\n",
              "<script type=\"text/javascript\">\n",
              "  var VEGA_DEBUG = (typeof VEGA_DEBUG == \"undefined\") ? {} : VEGA_DEBUG;\n",
              "  (function(spec, embedOpt){\n",
              "    let outputDiv = document.currentScript.previousElementSibling;\n",
              "    if (outputDiv.id !== \"altair-viz-83ef44ec9d9048d880186cd914719e2a\") {\n",
              "      outputDiv = document.getElementById(\"altair-viz-83ef44ec9d9048d880186cd914719e2a\");\n",
              "    }\n",
              "    const paths = {\n",
              "      \"vega\": \"https://cdn.jsdelivr.net/npm/vega@5?noext\",\n",
              "      \"vega-lib\": \"https://cdn.jsdelivr.net/npm/vega-lib?noext\",\n",
              "      \"vega-lite\": \"https://cdn.jsdelivr.net/npm/vega-lite@5.17.0?noext\",\n",
              "      \"vega-embed\": \"https://cdn.jsdelivr.net/npm/vega-embed@6?noext\",\n",
              "    };\n",
              "\n",
              "    function maybeLoadScript(lib, version) {\n",
              "      var key = `${lib.replace(\"-\", \"\")}_version`;\n",
              "      return (VEGA_DEBUG[key] == version) ?\n",
              "        Promise.resolve(paths[lib]) :\n",
              "        new Promise(function(resolve, reject) {\n",
              "          var s = document.createElement('script');\n",
              "          document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
              "          s.async = true;\n",
              "          s.onload = () => {\n",
              "            VEGA_DEBUG[key] = version;\n",
              "            return resolve(paths[lib]);\n",
              "          };\n",
              "          s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
              "          s.src = paths[lib];\n",
              "        });\n",
              "    }\n",
              "\n",
              "    function showError(err) {\n",
              "      outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
              "      throw err;\n",
              "    }\n",
              "\n",
              "    function displayChart(vegaEmbed) {\n",
              "      vegaEmbed(outputDiv, spec, embedOpt)\n",
              "        .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
              "    }\n",
              "\n",
              "    if(typeof define === \"function\" && define.amd) {\n",
              "      requirejs.config({paths});\n",
              "      require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
              "    } else {\n",
              "      maybeLoadScript(\"vega\", \"5\")\n",
              "        .then(() => maybeLoadScript(\"vega-lite\", \"5.17.0\"))\n",
              "        .then(() => maybeLoadScript(\"vega-embed\", \"6\"))\n",
              "        .catch(showError)\n",
              "        .then(() => displayChart(vegaEmbed));\n",
              "    }\n",
              "  })({\"config\": {\"view\": {\"continuousWidth\": 300, \"continuousHeight\": 300, \"discreteHeight\": 60, \"discreteWidth\": 400}, \"header\": {\"title\": null}, \"mark\": {\"tooltip\": null}, \"title\": {\"anchor\": \"middle\"}}, \"vconcat\": [{\"mark\": {\"type\": \"bar\", \"clip\": true, \"height\": 15}, \"encoding\": {\"color\": {\"field\": \"log2_bayes_factor\", \"scale\": {\"domain\": [-10, 0, 10], \"interpolate\": \"lab\", \"range\": [\"red\", \"#bbbbbb\", \"green\"]}, \"title\": \"Match weight\", \"type\": \"quantitative\"}, \"tooltip\": [{\"field\": \"comparison_name\", \"title\": \"Comparison name\", \"type\": \"nominal\"}, {\"field\": \"probability_two_random_records_match\", \"format\": \".4f\", \"title\": \"Probability two random records match\", \"type\": \"nominal\"}, {\"field\": \"log2_bayes_factor\", \"format\": \",.4f\", \"title\": \"Equivalent match weight\", \"type\": \"quantitative\"}, {\"field\": \"bayes_factor_description\", \"title\": \"Match weight description\", \"type\": \"nominal\"}], \"x\": {\"axis\": {\"domain\": false, \"gridColor\": {\"condition\": {\"test\": \"abs(datum.value / 10)  <= 1 & datum.value % 10 === 0\", \"value\": \"#aaa\"}, \"value\": \"#ddd\"}, \"gridDash\": {\"condition\": {\"test\": \"abs(datum.value / 10) == 1\", \"value\": [3]}, \"value\": null}, \"gridWidth\": {\"condition\": {\"test\": \"abs(datum.value / 10)  <= 1 & datum.value % 10 === 0\", \"value\": 2}, \"value\": 1}, \"labels\": false, \"ticks\": false, \"title\": \"\"}, \"field\": \"log2_bayes_factor\", \"scale\": {\"domain\": [-16, 16]}, \"type\": \"quantitative\"}, \"y\": {\"axis\": {\"title\": \"Prior (starting) match weight\", \"titleAlign\": \"right\", \"titleAngle\": 0, \"titleFontWeight\": \"normal\"}, \"field\": \"label_for_charts\", \"sort\": {\"field\": \"comparison_vector_value\", \"order\": \"descending\"}, \"type\": \"nominal\"}}, \"height\": 20, \"transform\": [{\"filter\": \"(datum.comparison_name == 'probability_two_random_records_match')\"}]}, {\"mark\": {\"type\": \"bar\", \"clip\": true}, \"encoding\": {\"color\": {\"field\": \"log2_bayes_factor\", \"scale\": {\"domain\": [-10, 0, 10], \"interpolate\": \"lab\", \"range\": [\"red\", \"#bbbbbb\", \"green\"]}, \"title\": \"Match weight\", \"type\": \"quantitative\"}, \"row\": {\"field\": \"comparison_name\", \"header\": {\"labelAlign\": \"left\", \"labelAnchor\": \"middle\", \"labelAngle\": 0}, \"sort\": {\"field\": \"comparison_sort_order\"}, \"type\": \"nominal\"}, \"tooltip\": [{\"field\": \"comparison_name\", \"title\": \"Comparison name\", \"type\": \"nominal\"}, {\"field\": \"label_for_charts\", \"title\": \"Label\", \"type\": \"ordinal\"}, {\"field\": \"sql_condition\", \"title\": \"SQL condition\", \"type\": \"nominal\"}, {\"field\": \"m_probability\", \"format\": \".4f\", \"title\": \"M probability\", \"type\": \"quantitative\"}, {\"field\": \"u_probability\", \"format\": \".4f\", \"title\": \"U probability\", \"type\": \"quantitative\"}, {\"field\": \"bayes_factor\", \"format\": \",.4f\", \"title\": \"Bayes factor = m/u\", \"type\": \"quantitative\"}, {\"field\": \"log2_bayes_factor\", \"format\": \",.4f\", \"title\": \"Match weight = log2(m/u)\", \"type\": \"quantitative\"}, {\"field\": \"bayes_factor_description\", \"title\": \"Match weight description\", \"type\": \"nominal\"}], \"x\": {\"axis\": {\"gridColor\": {\"condition\": {\"test\": \"abs(datum.value / 10)  <= 1 & datum.value % 10 === 0\", \"value\": \"#aaa\"}, \"value\": \"#ddd\"}, \"gridDash\": {\"condition\": {\"test\": \"abs(datum.value / 10) == 1\", \"value\": [3]}, \"value\": null}, \"gridWidth\": {\"condition\": {\"test\": \"abs(datum.value / 10)  <= 1 & datum.value % 10 === 0\", \"value\": 2}, \"value\": 1}, \"title\": \"Comparison level match weight = log2(m/u)\"}, \"field\": \"log2_bayes_factor\", \"scale\": {\"domain\": [-16, 16]}, \"type\": \"quantitative\"}, \"y\": {\"axis\": {\"title\": null}, \"field\": \"label_for_charts\", \"sort\": {\"field\": \"comparison_vector_value\", \"order\": \"descending\"}, \"type\": \"nominal\"}}, \"height\": {\"step\": 12}, \"resolve\": {\"axis\": {\"y\": \"independent\"}, \"scale\": {\"y\": \"independent\"}}, \"transform\": [{\"filter\": \"(datum.comparison_name != 'probability_two_random_records_match')\"}]}], \"data\": {\"name\": \"data-a714ce1f8256a6d0f3a57df59f45903e\"}, \"params\": [{\"name\": \"mouse_zoom\", \"select\": {\"type\": \"interval\", \"encodings\": [\"x\"]}, \"bind\": \"scales\", \"views\": []}], \"resolve\": {\"axis\": {\"y\": \"independent\"}, \"scale\": {\"y\": \"independent\"}}, \"title\": {\"text\": \"Model parameters (components of final match weight)\", \"subtitle\": \"Use mousewheel to zoom\"}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v5.9.3.json\", \"datasets\": {\"data-a714ce1f8256a6d0f3a57df59f45903e\": [{\"comparison_name\": \"probability_two_random_records_match\", \"sql_condition\": null, \"label_for_charts\": \"\", \"m_probability\": null, \"u_probability\": null, \"m_probability_description\": null, \"u_probability_description\": null, \"has_tf_adjustments\": false, \"tf_adjustment_column\": null, \"tf_adjustment_weight\": null, \"is_null_level\": false, \"bayes_factor\": 2.206287920573635e-05, \"log2_bayes_factor\": -15.468019399518882, \"comparison_vector_value\": 0, \"max_comparison_vector_value\": 0, \"bayes_factor_description\": \"The probability that two random records drawn at random match is 0.000 or one in  45,326.0 records.This is equivalent to a starting match weight of -15.468.\", \"probability_two_random_records_match\": 2.2062392445836827e-05, \"comparison_sort_order\": -1}, {\"comparison_name\": \"amount\", \"sql_condition\": \"\\\"amount_l\\\" = \\\"amount_r\\\"\", \"label_for_charts\": \"Exact match on amount\", \"m_probability\": 0.24685872265268072, \"u_probability\": 2.2317307477718196e-05, \"m_probability_description\": \"Amongst matching record comparisons, 24.69% of records are in the exact match on amount comparison level\", \"u_probability_description\": \"Amongst non-matching record comparisons, 0.00% of records are in the exact match on amount comparison level\", \"has_tf_adjustments\": false, \"tf_adjustment_column\": null, \"tf_adjustment_weight\": 1.0, \"is_null_level\": false, \"bayes_factor\": 11061.312969727496, \"log2_bayes_factor\": 13.433235022128335, \"comparison_vector_value\": 5, \"max_comparison_vector_value\": 5, \"bayes_factor_description\": \"If comparison level is `exact match on amount` then comparison is 11,061.31 times more likely to be a match\", \"probability_two_random_records_match\": 2.2062392445836827e-05, \"comparison_sort_order\": 0}, {\"comparison_name\": \"amount\", \"sql_condition\": \"(ABS(\\\"amount_l\\\" - \\\"amount_r\\\") / (CASE WHEN \\\"amount_r\\\" > \\\"amount_l\\\" THEN \\\"amount_r\\\" ELSE \\\"amount_l\\\" END)) < 0.01\", \"label_for_charts\": \"Percentage difference of 'amount' within 1.00%\", \"m_probability\": 0.19779258983250028, \"u_probability\": 0.0034510672745089684, \"m_probability_description\": \"Amongst matching record comparisons, 19.78% of records are in the percentage difference of 'amount' within 1.00% comparison level\", \"u_probability_description\": \"Amongst non-matching record comparisons, 0.35% of records are in the percentage difference of 'amount' within 1.00% comparison level\", \"has_tf_adjustments\": false, \"tf_adjustment_column\": null, \"tf_adjustment_weight\": 1.0, \"is_null_level\": false, \"bayes_factor\": 57.31345525874833, \"log2_bayes_factor\": 5.840801969581078, \"comparison_vector_value\": 4, \"max_comparison_vector_value\": 5, \"bayes_factor_description\": \"If comparison level is `percentage difference of 'amount' within 1.00%` then comparison is 57.31 times more likely to be a match\", \"probability_two_random_records_match\": 2.2062392445836827e-05, \"comparison_sort_order\": 0}, {\"comparison_name\": \"amount\", \"sql_condition\": \"(ABS(\\\"amount_l\\\" - \\\"amount_r\\\") / (CASE WHEN \\\"amount_r\\\" > \\\"amount_l\\\" THEN \\\"amount_r\\\" ELSE \\\"amount_l\\\" END)) < 0.03\", \"label_for_charts\": \"Percentage difference of 'amount' within 3.00%\", \"m_probability\": 0.31721909917341545, \"u_probability\": 0.007092846085645711, \"m_probability_description\": \"Amongst matching record comparisons, 31.72% of records are in the percentage difference of 'amount' within 3.00% comparison level\", \"u_probability_description\": \"Amongst non-matching record comparisons, 0.71% of records are in the percentage difference of 'amount' within 3.00% comparison level\", \"has_tf_adjustments\": false, \"tf_adjustment_column\": null, \"tf_adjustment_weight\": 1.0, \"is_null_level\": false, \"bayes_factor\": 44.723809785664734, \"log2_bayes_factor\": 5.482971183890822, \"comparison_vector_value\": 3, \"max_comparison_vector_value\": 5, \"bayes_factor_description\": \"If comparison level is `percentage difference of 'amount' within 3.00%` then comparison is 44.72 times more likely to be a match\", \"probability_two_random_records_match\": 2.2062392445836827e-05, \"comparison_sort_order\": 0}, {\"comparison_name\": \"amount\", \"sql_condition\": \"(ABS(\\\"amount_l\\\" - \\\"amount_r\\\") / (CASE WHEN \\\"amount_r\\\" > \\\"amount_l\\\" THEN \\\"amount_r\\\" ELSE \\\"amount_l\\\" END)) < 0.1\", \"label_for_charts\": \"Percentage difference of 'amount' within 10.00%\", \"m_probability\": 0.23149568734526935, \"u_probability\": 0.02613255263334084, \"m_probability_description\": \"Amongst matching record comparisons, 23.15% of records are in the percentage difference of 'amount' within 10.00% comparison level\", \"u_probability_description\": \"Amongst non-matching record comparisons, 2.61% of records are in the percentage difference of 'amount' within 10.00% comparison level\", \"has_tf_adjustments\": false, \"tf_adjustment_column\": null, \"tf_adjustment_weight\": 1.0, \"is_null_level\": false, \"bayes_factor\": 8.858517979216424, \"log2_bayes_factor\": 3.1470653575978704, \"comparison_vector_value\": 2, \"max_comparison_vector_value\": 5, \"bayes_factor_description\": \"If comparison level is `percentage difference of 'amount' within 10.00%` then comparison is 8.86 times more likely to be a match\", \"probability_two_random_records_match\": 2.2062392445836827e-05, \"comparison_sort_order\": 0}, {\"comparison_name\": \"amount\", \"sql_condition\": \"(ABS(\\\"amount_l\\\" - \\\"amount_r\\\") / (CASE WHEN \\\"amount_r\\\" > \\\"amount_l\\\" THEN \\\"amount_r\\\" ELSE \\\"amount_l\\\" END)) < 0.3\", \"label_for_charts\": \"Percentage difference of 'amount' within 30.00%\", \"m_probability\": 0.0007603567508388371, \"u_probability\": 0.08587395590505811, \"m_probability_description\": \"Amongst matching record comparisons, 0.08% of records are in the percentage difference of 'amount' within 30.00% comparison level\", \"u_probability_description\": \"Amongst non-matching record comparisons, 8.59% of records are in the percentage difference of 'amount' within 30.00% comparison level\", \"has_tf_adjustments\": false, \"tf_adjustment_column\": null, \"tf_adjustment_weight\": 1.0, \"is_null_level\": false, \"bayes_factor\": 0.008854334737757794, \"log2_bayes_factor\": -6.819400369169093, \"comparison_vector_value\": 1, \"max_comparison_vector_value\": 5, \"bayes_factor_description\": \"If comparison level is `percentage difference of 'amount' within 30.00%` then comparison is  112.94 times less likely to be a match\", \"probability_two_random_records_match\": 2.2062392445836827e-05, \"comparison_sort_order\": 0}, {\"comparison_name\": \"amount\", \"sql_condition\": \"ELSE\", \"label_for_charts\": \"All other comparisons\", \"m_probability\": 0.005873544245295437, \"u_probability\": 0.8774272607939686, \"m_probability_description\": \"Amongst matching record comparisons, 0.59% of records are in the all other comparisons comparison level\", \"u_probability_description\": \"Amongst non-matching record comparisons, 87.74% of records are in the all other comparisons comparison level\", \"has_tf_adjustments\": false, \"tf_adjustment_column\": null, \"tf_adjustment_weight\": 1.0, \"is_null_level\": false, \"bayes_factor\": 0.0066940526101053315, \"log2_bayes_factor\": -7.222904395119598, \"comparison_vector_value\": 0, \"max_comparison_vector_value\": 5, \"bayes_factor_description\": \"If comparison level is `all other comparisons` then comparison is  149.39 times less likely to be a match\", \"probability_two_random_records_match\": 2.2062392445836827e-05, \"comparison_sort_order\": 0}, {\"comparison_name\": \"memo\", \"sql_condition\": \"\\\"memo_l\\\" = \\\"memo_r\\\"\", \"label_for_charts\": \"Exact match on memo\", \"m_probability\": 0.4267610924827707, \"u_probability\": 2.2317307477718196e-05, \"m_probability_description\": \"Amongst matching record comparisons, 42.68% of records are in the exact match on memo comparison level\", \"u_probability_description\": \"Amongst non-matching record comparisons, 0.00% of records are in the exact match on memo comparison level\", \"has_tf_adjustments\": false, \"tf_adjustment_column\": null, \"tf_adjustment_weight\": 1.0, \"is_null_level\": false, \"bayes_factor\": 19122.42742135685, \"log2_bayes_factor\": 14.222978051741993, \"comparison_vector_value\": 4, \"max_comparison_vector_value\": 4, \"bayes_factor_description\": \"If comparison level is `exact match on memo` then comparison is 19,122.43 times more likely to be a match\", \"probability_two_random_records_match\": 2.2062392445836827e-05, \"comparison_sort_order\": 1}, {\"comparison_name\": \"memo\", \"sql_condition\": \"levenshtein(\\\"memo_l\\\", \\\"memo_r\\\") <= 2\", \"label_for_charts\": \"Levenshtein distance of memo <= 2\", \"m_probability\": 0.10817684036314526, \"u_probability\": 0.0021962259404209043, \"m_probability_description\": \"Amongst matching record comparisons, 10.82% of records are in the levenshtein distance of memo <= 2 comparison level\", \"u_probability_description\": \"Amongst non-matching record comparisons, 0.22% of records are in the levenshtein distance of memo <= 2 comparison level\", \"has_tf_adjustments\": false, \"tf_adjustment_column\": null, \"tf_adjustment_weight\": 1.0, \"is_null_level\": false, \"bayes_factor\": 49.255788474301184, \"log2_bayes_factor\": 5.622221373008662, \"comparison_vector_value\": 3, \"max_comparison_vector_value\": 4, \"bayes_factor_description\": \"If comparison level is `levenshtein distance of memo <= 2` then comparison is 49.26 times more likely to be a match\", \"probability_two_random_records_match\": 2.2062392445836827e-05, \"comparison_sort_order\": 1}, {\"comparison_name\": \"memo\", \"sql_condition\": \"levenshtein(\\\"memo_l\\\", \\\"memo_r\\\") <= 6\", \"label_for_charts\": \"Levenshtein distance of memo <= 6\", \"m_probability\": 0.2591582087321121, \"u_probability\": 0.02739956704423493, \"m_probability_description\": \"Amongst matching record comparisons, 25.92% of records are in the levenshtein distance of memo <= 6 comparison level\", \"u_probability_description\": \"Amongst non-matching record comparisons, 2.74% of records are in the levenshtein distance of memo <= 6 comparison level\", \"has_tf_adjustments\": false, \"tf_adjustment_column\": null, \"tf_adjustment_weight\": 1.0, \"is_null_level\": false, \"bayes_factor\": 9.458478242145832, \"log2_bayes_factor\": 3.241608089578434, \"comparison_vector_value\": 2, \"max_comparison_vector_value\": 4, \"bayes_factor_description\": \"If comparison level is `levenshtein distance of memo <= 6` then comparison is 9.46 times more likely to be a match\", \"probability_two_random_records_match\": 2.2062392445836827e-05, \"comparison_sort_order\": 1}, {\"comparison_name\": \"memo\", \"sql_condition\": \"levenshtein(\\\"memo_l\\\", \\\"memo_r\\\") <= 10\", \"label_for_charts\": \"Levenshtein distance of memo <= 10\", \"m_probability\": 0.14779726216531575, \"u_probability\": 0.09805717694175792, \"m_probability_description\": \"Amongst matching record comparisons, 14.78% of records are in the levenshtein distance of memo <= 10 comparison level\", \"u_probability_description\": \"Amongst non-matching record comparisons, 9.81% of records are in the levenshtein distance of memo <= 10 comparison level\", \"has_tf_adjustments\": false, \"tf_adjustment_column\": null, \"tf_adjustment_weight\": 1.0, \"is_null_level\": false, \"bayes_factor\": 1.507255937554693, \"log2_bayes_factor\": 0.5919244126159111, \"comparison_vector_value\": 1, \"max_comparison_vector_value\": 4, \"bayes_factor_description\": \"If comparison level is `levenshtein distance of memo <= 10` then comparison is 1.51 times more likely to be a match\", \"probability_two_random_records_match\": 2.2062392445836827e-05, \"comparison_sort_order\": 1}, {\"comparison_name\": \"memo\", \"sql_condition\": \"ELSE\", \"label_for_charts\": \"All other comparisons\", \"m_probability\": 0.058106596256656165, \"u_probability\": 0.8723247127661086, \"m_probability_description\": \"Amongst matching record comparisons, 5.81% of records are in the all other comparisons comparison level\", \"u_probability_description\": \"Amongst non-matching record comparisons, 87.23% of records are in the all other comparisons comparison level\", \"has_tf_adjustments\": false, \"tf_adjustment_column\": null, \"tf_adjustment_weight\": 1.0, \"is_null_level\": false, \"bayes_factor\": 0.06661120040082592, \"log2_bayes_factor\": -3.9080914088124747, \"comparison_vector_value\": 0, \"max_comparison_vector_value\": 4, \"bayes_factor_description\": \"If comparison level is `all other comparisons` then comparison is  15.01 times less likely to be a match\", \"probability_two_random_records_match\": 2.2062392445836827e-05, \"comparison_sort_order\": 1}, {\"comparison_name\": \"transaction_date\", \"sql_condition\": \"transaction_date_r - transaction_date_l <= 1 and transaction_date_r >= transaction_date_l\", \"label_for_charts\": \"1 day\", \"m_probability\": 0.38903576440838894, \"u_probability\": 0.01929737000675606, \"m_probability_description\": \"Amongst matching record comparisons, 38.90% of records are in the 1 day comparison level\", \"u_probability_description\": \"Amongst non-matching record comparisons, 1.93% of records are in the 1 day comparison level\", \"has_tf_adjustments\": false, \"tf_adjustment_column\": null, \"tf_adjustment_weight\": 1.0, \"is_null_level\": false, \"bayes_factor\": 20.160040682859194, \"log2_bayes_factor\": 4.333426645079358, \"comparison_vector_value\": 4, \"max_comparison_vector_value\": 4, \"bayes_factor_description\": \"If comparison level is `1 day` then comparison is 20.16 times more likely to be a match\", \"probability_two_random_records_match\": 2.2062392445836827e-05, \"comparison_sort_order\": 2}, {\"comparison_name\": \"transaction_date\", \"sql_condition\": \"transaction_date_r - transaction_date_l <= 4 and transaction_date_r >= transaction_date_l\", \"label_for_charts\": \"<=4 days\", \"m_probability\": 0.466140837509571, \"u_probability\": 0.028893812222174884, \"m_probability_description\": \"Amongst matching record comparisons, 46.61% of records are in the <=4 days comparison level\", \"u_probability_description\": \"Amongst non-matching record comparisons, 2.89% of records are in the <=4 days comparison level\", \"has_tf_adjustments\": false, \"tf_adjustment_column\": null, \"tf_adjustment_weight\": 1.0, \"is_null_level\": false, \"bayes_factor\": 16.132894957759362, \"log2_bayes_factor\": 4.011933440166542, \"comparison_vector_value\": 3, \"max_comparison_vector_value\": 4, \"bayes_factor_description\": \"If comparison level is `<=4 days` then comparison is 16.13 times more likely to be a match\", \"probability_two_random_records_match\": 2.2062392445836827e-05, \"comparison_sort_order\": 2}, {\"comparison_name\": \"transaction_date\", \"sql_condition\": \"transaction_date_r - transaction_date_l <= 10 and transaction_date_r >= transaction_date_l\", \"label_for_charts\": \"<=10 days\", \"m_probability\": 0.09609677790958321, \"u_probability\": 0.05649220618757494, \"m_probability_description\": \"Amongst matching record comparisons, 9.61% of records are in the <=10 days comparison level\", \"u_probability_description\": \"Amongst non-matching record comparisons, 5.65% of records are in the <=10 days comparison level\", \"has_tf_adjustments\": false, \"tf_adjustment_column\": null, \"tf_adjustment_weight\": 1.0, \"is_null_level\": false, \"bayes_factor\": 1.7010625782697617, \"log2_bayes_factor\": 0.766436215486194, \"comparison_vector_value\": 2, \"max_comparison_vector_value\": 4, \"bayes_factor_description\": \"If comparison level is `<=10 days` then comparison is 1.70 times more likely to be a match\", \"probability_two_random_records_match\": 2.2062392445836827e-05, \"comparison_sort_order\": 2}, {\"comparison_name\": \"transaction_date\", \"sql_condition\": \"transaction_date_r - transaction_date_l <= 30 and transaction_date_r >= transaction_date_l\", \"label_for_charts\": \"<=30 days\", \"m_probability\": 0.04864613690105879, \"u_probability\": 0.16279157055008106, \"m_probability_description\": \"Amongst matching record comparisons, 4.86% of records are in the <=30 days comparison level\", \"u_probability_description\": \"Amongst non-matching record comparisons, 16.28% of records are in the <=30 days comparison level\", \"has_tf_adjustments\": false, \"tf_adjustment_column\": null, \"tf_adjustment_weight\": 1.0, \"is_null_level\": false, \"bayes_factor\": 0.2988246672520021, \"log2_bayes_factor\": -1.7426288508657286, \"comparison_vector_value\": 1, \"max_comparison_vector_value\": 4, \"bayes_factor_description\": \"If comparison level is `<=30 days` then comparison is  3.35 times less likely to be a match\", \"probability_two_random_records_match\": 2.2062392445836827e-05, \"comparison_sort_order\": 2}, {\"comparison_name\": \"transaction_date\", \"sql_condition\": \"ELSE\", \"label_for_charts\": \"All other comparisons\", \"m_probability\": 8.04832713980663e-05, \"u_probability\": 0.7325250410334131, \"m_probability_description\": \"Amongst matching record comparisons, 0.01% of records are in the all other comparisons comparison level\", \"u_probability_description\": \"Amongst non-matching record comparisons, 73.25% of records are in the all other comparisons comparison level\", \"has_tf_adjustments\": false, \"tf_adjustment_column\": null, \"tf_adjustment_weight\": 1.0, \"is_null_level\": false, \"bayes_factor\": 0.00010987101722082313, \"log2_bayes_factor\": -13.151901510334673, \"comparison_vector_value\": 0, \"max_comparison_vector_value\": 4, \"bayes_factor_description\": \"If comparison level is `all other comparisons` then comparison is  9,101.58 times less likely to be a match\", \"probability_two_random_records_match\": 2.2062392445836827e-05, \"comparison_sort_order\": 2}]}}, {\"mode\": \"vega-lite\"});\n",
              "</script>"
            ],
            "text/plain": [
              "alt.VConcatChart(...)"
            ]
          },
          "execution_count": 10,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "linker.visualisations.match_weights_chart()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-06-07T09:22:37.835082Z",
          "iopub.status.busy": "2024-06-07T09:22:37.834871Z",
          "iopub.status.idle": "2024-06-07T09:22:58.616771Z",
          "shell.execute_reply": "2024-06-07T09:22:58.615862Z"
        }
      },
      "outputs": [
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "862ba86b3fa649ddb3c14eee78c00fed",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ],
      "source": [
        "df_predict = linker.inference.predict(threshold_match_probability=0.001)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-06-07T09:22:58.620828Z",
          "iopub.status.busy": "2024-06-07T09:22:58.620523Z",
          "iopub.status.idle": "2024-06-07T09:22:59.018555Z",
          "shell.execute_reply": "2024-06-07T09:22:59.017917Z"
        }
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "\n",
              "        <iframe\n",
              "            width=\"100%\"\n",
              "            height=\"1200\"\n",
              "            src=\"./dashboards/comparison_viewer_transactions.html\"\n",
              "            frameborder=\"0\"\n",
              "            allowfullscreen\n",
              "            \n",
              "        ></iframe>\n",
              "        "
            ],
            "text/plain": [
              "<IPython.lib.display.IFrame at 0x12ae78190>"
            ]
          },
          "execution_count": 12,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "linker.visualisations.comparison_viewer_dashboard(\n",
        "    df_predict, \"dashboards/comparison_viewer_transactions.html\", overwrite=True\n",
        ")\n",
        "from IPython.display import IFrame\n",
        "\n",
        "IFrame(\n",
        "    src=\"./dashboards/comparison_viewer_transactions.html\", width=\"100%\", height=1200\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-06-07T09:22:59.022067Z",
          "iopub.status.busy": "2024-06-07T09:22:59.021794Z",
          "iopub.status.idle": "2024-06-07T09:23:04.254280Z",
          "shell.execute_reply": "2024-06-07T09:23:04.253648Z"
        }
      },
      "outputs": [
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "cdc0840392db4f8da99156e19a89599e",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/html": [
              "\n",
              "<style>\n",
              "  #altair-viz-8799cea88f704d4cb31523677517e755.vega-embed {\n",
              "    width: 100%;\n",
              "    display: flex;\n",
              "  }\n",
              "\n",
              "  #altair-viz-8799cea88f704d4cb31523677517e755.vega-embed details,\n",
              "  #altair-viz-8799cea88f704d4cb31523677517e755.vega-embed details summary {\n",
              "    position: relative;\n",
              "  }\n",
              "</style>\n",
              "<div id=\"altair-viz-8799cea88f704d4cb31523677517e755\"></div>\n",
              "<script type=\"text/javascript\">\n",
              "  var VEGA_DEBUG = (typeof VEGA_DEBUG == \"undefined\") ? {} : VEGA_DEBUG;\n",
              "  (function(spec, embedOpt){\n",
              "    let outputDiv = document.currentScript.previousElementSibling;\n",
              "    if (outputDiv.id !== \"altair-viz-8799cea88f704d4cb31523677517e755\") {\n",
              "      outputDiv = document.getElementById(\"altair-viz-8799cea88f704d4cb31523677517e755\");\n",
              "    }\n",
              "    const paths = {\n",
              "      \"vega\": \"https://cdn.jsdelivr.net/npm/vega@5?noext\",\n",
              "      \"vega-lib\": \"https://cdn.jsdelivr.net/npm/vega-lib?noext\",\n",
              "      \"vega-lite\": \"https://cdn.jsdelivr.net/npm/vega-lite@5.17.0?noext\",\n",
              "      \"vega-embed\": \"https://cdn.jsdelivr.net/npm/vega-embed@6?noext\",\n",
              "    };\n",
              "\n",
              "    function maybeLoadScript(lib, version) {\n",
              "      var key = `${lib.replace(\"-\", \"\")}_version`;\n",
              "      return (VEGA_DEBUG[key] == version) ?\n",
              "        Promise.resolve(paths[lib]) :\n",
              "        new Promise(function(resolve, reject) {\n",
              "          var s = document.createElement('script');\n",
              "          document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
              "          s.async = true;\n",
              "          s.onload = () => {\n",
              "            VEGA_DEBUG[key] = version;\n",
              "            return resolve(paths[lib]);\n",
              "          };\n",
              "          s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
              "          s.src = paths[lib];\n",
              "        });\n",
              "    }\n",
              "\n",
              "    function showError(err) {\n",
              "      outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
              "      throw err;\n",
              "    }\n",
              "\n",
              "    function displayChart(vegaEmbed) {\n",
              "      vegaEmbed(outputDiv, spec, embedOpt)\n",
              "        .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
              "    }\n",
              "\n",
              "    if(typeof define === \"function\" && define.amd) {\n",
              "      requirejs.config({paths});\n",
              "      require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
              "    } else {\n",
              "      maybeLoadScript(\"vega\", \"5\")\n",
              "        .then(() => maybeLoadScript(\"vega-lite\", \"5.17.0\"))\n",
              "        .then(() => maybeLoadScript(\"vega-embed\", \"6\"))\n",
              "        .catch(showError)\n",
              "        .then(() => displayChart(vegaEmbed));\n",
              "    }\n",
              "  })({\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"layer\": [{\"layer\": [{\"mark\": \"rule\", \"encoding\": {\"color\": {\"value\": \"black\"}, \"size\": {\"value\": 0.5}, \"y\": {\"field\": \"zero\", \"type\": \"quantitative\"}}}, {\"mark\": {\"type\": \"bar\", \"width\": 60}, \"encoding\": {\"color\": {\"condition\": {\"test\": \"(datum.log2_bayes_factor < 0)\", \"value\": \"red\"}, \"value\": \"green\"}, \"opacity\": {\"condition\": {\"test\": \"datum.column_name == 'Prior match weight' || datum.column_name == 'Final score'\", \"value\": 1}, \"value\": 0.5}, \"tooltip\": [{\"field\": \"column_name\", \"title\": \"Comparison column\", \"type\": \"nominal\"}, {\"field\": \"value_l\", \"title\": \"Value (L)\", \"type\": \"nominal\"}, {\"field\": \"value_r\", \"title\": \"Value (R)\", \"type\": \"nominal\"}, {\"field\": \"label_for_charts\", \"title\": \"Label\", \"type\": \"ordinal\"}, {\"field\": \"sql_condition\", \"title\": \"SQL condition\", \"type\": \"nominal\"}, {\"field\": \"comparison_vector_value\", \"title\": \"Comparison vector value\", \"type\": \"nominal\"}, {\"field\": \"bayes_factor\", \"format\": \",.4f\", \"title\": \"Bayes factor = m/u\", \"type\": \"quantitative\"}, {\"field\": \"log2_bayes_factor\", \"format\": \",.4f\", \"title\": \"Match weight = log2(m/u)\", \"type\": \"quantitative\"}, {\"field\": \"prob\", \"format\": \".4f\", \"title\": \"Cumulative match probability\", \"type\": \"quantitative\"}, {\"field\": \"bayes_factor_description\", \"title\": \"Match weight description\", \"type\": \"nominal\"}], \"x\": {\"axis\": {\"grid\": true, \"labelAlign\": \"center\", \"labelAngle\": -20, \"labelExpr\": \"datum.value == 'Prior' || datum.value == 'Final score' ? '' : datum.value\", \"labelPadding\": 10, \"tickBand\": \"extent\", \"title\": \"Column\"}, \"field\": \"column_name\", \"sort\": {\"field\": \"bar_sort_order\", \"order\": \"ascending\"}, \"type\": \"nominal\"}, \"y\": {\"axis\": {\"grid\": false, \"orient\": \"left\", \"title\": \"Match Weight\"}, \"field\": \"previous_sum\", \"type\": \"quantitative\"}, \"y2\": {\"field\": \"sum\"}}}, {\"mark\": {\"type\": \"text\", \"fontWeight\": \"bold\"}, \"encoding\": {\"color\": {\"value\": \"white\"}, \"text\": {\"condition\": {\"test\": \"abs(datum.log2_bayes_factor) > 1\", \"field\": \"log2_bayes_factor\", \"format\": \".2f\", \"type\": \"nominal\"}, \"value\": \"\"}, \"x\": {\"axis\": {\"labelAngle\": -20, \"title\": \"Column\"}, \"field\": \"column_name\", \"sort\": {\"field\": \"bar_sort_order\", \"order\": \"ascending\"}, \"type\": \"nominal\"}, \"y\": {\"axis\": {\"orient\": \"left\"}, \"field\": \"center\", \"type\": \"quantitative\"}}}, {\"mark\": {\"type\": \"text\", \"baseline\": \"bottom\", \"dy\": -25, \"fontWeight\": \"bold\"}, \"encoding\": {\"color\": {\"value\": \"black\"}, \"text\": {\"field\": \"column_name\", \"type\": \"nominal\"}, \"x\": {\"axis\": {\"labelAngle\": -20, \"title\": \"Column\"}, \"field\": \"column_name\", \"sort\": {\"field\": \"bar_sort_order\", \"order\": \"ascending\"}, \"type\": \"nominal\"}, \"y\": {\"field\": \"sum_top\", \"type\": \"quantitative\"}}}, {\"mark\": {\"type\": \"text\", \"baseline\": \"bottom\", \"dy\": -13, \"fontSize\": 8}, \"encoding\": {\"color\": {\"value\": \"grey\"}, \"text\": {\"field\": \"value_l\", \"type\": \"nominal\"}, \"x\": {\"axis\": {\"labelAngle\": -20, \"title\": \"Column\"}, \"field\": \"column_name\", \"sort\": {\"field\": \"bar_sort_order\", \"order\": \"ascending\"}, \"type\": \"nominal\"}, \"y\": {\"field\": \"sum_top\", \"type\": \"quantitative\"}}}, {\"mark\": {\"type\": \"text\", \"baseline\": \"bottom\", \"dy\": -5, \"fontSize\": 8}, \"encoding\": {\"color\": {\"value\": \"grey\"}, \"text\": {\"field\": \"value_r\", \"type\": \"nominal\"}, \"x\": {\"axis\": {\"labelAngle\": -20, \"title\": \"Column\"}, \"field\": \"column_name\", \"sort\": {\"field\": \"bar_sort_order\", \"order\": \"ascending\"}, \"type\": \"nominal\"}, \"y\": {\"field\": \"sum_top\", \"type\": \"quantitative\"}}}]}, {\"mark\": {\"type\": \"rule\", \"color\": \"black\", \"strokeWidth\": 2, \"x2Offset\": 30, \"xOffset\": -30}, \"encoding\": {\"x\": {\"axis\": {\"labelAngle\": -20, \"title\": \"Column\"}, \"field\": \"column_name\", \"sort\": {\"field\": \"bar_sort_order\", \"order\": \"ascending\"}, \"type\": \"nominal\"}, \"x2\": {\"field\": \"lead\"}, \"y\": {\"axis\": {\"labelExpr\": \"format(1 / (1 + pow(2, -1*datum.value)), '.2r')\", \"orient\": \"right\", \"title\": \"Probability\"}, \"field\": \"sum\", \"scale\": {\"zero\": false}, \"type\": \"quantitative\"}}}], \"data\": {\"name\": \"data-6e76e82f0ddb2734cf5b1f4edbf12868\"}, \"height\": 450, \"params\": [{\"name\": \"record_number\", \"bind\": {\"input\": \"range\", \"max\": 4, \"min\": 0, \"step\": 1}, \"value\": 0}], \"resolve\": {\"axis\": {\"y\": \"independent\"}}, \"title\": {\"text\": \"Match weights waterfall chart\", \"subtitle\": \"How each comparison contributes to the final match score\"}, \"transform\": [{\"filter\": \"(datum.record_number == record_number)\"}, {\"filter\": \"(datum.bayes_factor !== 1.0)\"}, {\"window\": [{\"op\": \"sum\", \"field\": \"log2_bayes_factor\", \"as\": \"sum\"}, {\"op\": \"lead\", \"field\": \"column_name\", \"as\": \"lead\"}], \"frame\": [null, 0]}, {\"calculate\": \"datum.column_name === \\\"Final score\\\" ? datum.sum - datum.log2_bayes_factor : datum.sum\", \"as\": \"sum\"}, {\"calculate\": \"datum.lead === null ? datum.column_name : datum.lead\", \"as\": \"lead\"}, {\"calculate\": \"datum.column_name === \\\"Final score\\\" || datum.column_name === \\\"Prior match weight\\\" ? 0 : datum.sum - datum.log2_bayes_factor\", \"as\": \"previous_sum\"}, {\"calculate\": \"datum.sum > datum.previous_sum ? datum.column_name : \\\"\\\"\", \"as\": \"top_label\"}, {\"calculate\": \"datum.sum < datum.previous_sum ? datum.column_name : \\\"\\\"\", \"as\": \"bottom_label\"}, {\"calculate\": \"datum.sum > datum.previous_sum ? datum.sum : datum.previous_sum\", \"as\": \"sum_top\"}, {\"calculate\": \"datum.sum < datum.previous_sum ? datum.sum : datum.previous_sum\", \"as\": \"sum_bottom\"}, {\"calculate\": \"(datum.sum + datum.previous_sum) / 2\", \"as\": \"center\"}, {\"calculate\": \"(datum.log2_bayes_factor > 0 ? \\\"+\\\" : \\\"\\\") + datum.log2_bayes_factor\", \"as\": \"text_log2_bayes_factor\"}, {\"calculate\": \"datum.sum < datum.previous_sum ? 4 : -4\", \"as\": \"dy\"}, {\"calculate\": \"datum.sum < datum.previous_sum ? \\\"top\\\" : \\\"bottom\\\"\", \"as\": \"baseline\"}, {\"calculate\": \"1. / (1 + pow(2, -1.*datum.sum))\", \"as\": \"prob\"}, {\"calculate\": \"0*datum.sum\", \"as\": \"zero\"}], \"width\": {\"step\": 75}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v5.9.3.json\", \"datasets\": {\"data-6e76e82f0ddb2734cf5b1f4edbf12868\": [{\"column_name\": \"Prior\", \"label_for_charts\": \"Starting match weight (prior)\", \"sql_condition\": null, \"log2_bayes_factor\": -15.468019399518882, \"bayes_factor\": 2.206287920573635e-05, \"comparison_vector_value\": null, \"m_probability\": null, \"u_probability\": null, \"bayes_factor_description\": null, \"value_l\": \"\", \"value_r\": \"\", \"term_frequency_adjustment\": null, \"bar_sort_order\": 0, \"record_number\": 0}, {\"sql_condition\": \"(ABS(\\\"amount_l\\\" - \\\"amount_r\\\") / (CASE WHEN \\\"amount_r\\\" > \\\"amount_l\\\" THEN \\\"amount_r\\\" ELSE \\\"amount_l\\\" END)) < 0.01\", \"label_for_charts\": \"Percentage difference of 'amount' within 1.00%\", \"m_probability\": 0.19779258983250028, \"u_probability\": 0.0034510672745089684, \"bayes_factor\": 57.31345525874833, \"log2_bayes_factor\": 5.840801969581078, \"comparison_vector_value\": 4, \"bayes_factor_description\": \"If comparison level is `percentage difference of 'amount' within 1.00%` then comparison is 57.31 times more likely to be a match\", \"column_name\": \"amount\", \"value_l\": \"256.35\", \"value_r\": \"255.78\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 1, \"record_number\": 0}, {\"sql_condition\": \"levenshtein(\\\"memo_l\\\", \\\"memo_r\\\") <= 2\", \"label_for_charts\": \"Levenshtein distance of memo <= 2\", \"m_probability\": 0.10817684036314526, \"u_probability\": 0.0021962259404209043, \"bayes_factor\": 49.255788474301184, \"log2_bayes_factor\": 5.622221373008662, \"comparison_vector_value\": 3, \"bayes_factor_description\": \"If comparison level is `levenshtein distance of memo <= 2` then comparison is 49.26 times more likely to be a match\", \"column_name\": \"memo\", \"value_l\": \"E N BGC\", \"value_r\": \"R R BGC\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 2, \"record_number\": 0}, {\"sql_condition\": \"transaction_date_r - transaction_date_l <= 4 and transaction_date_r >= transaction_date_l\", \"label_for_charts\": \"<=4 days\", \"m_probability\": 0.466140837509571, \"u_probability\": 0.028893812222174884, \"bayes_factor\": 16.132894957759362, \"log2_bayes_factor\": 4.011933440166542, \"comparison_vector_value\": 3, \"bayes_factor_description\": \"If comparison level is `<=4 days` then comparison is 16.13 times more likely to be a match\", \"column_name\": \"transaction_date\", \"value_l\": \"2022-04-29 00:00:00\", \"value_r\": \"2022-05-02 00:00:00\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 3, \"record_number\": 0}, {\"column_name\": \"Final score\", \"label_for_charts\": \"Final score\", \"sql_condition\": null, \"log2_bayes_factor\": 0.006937383237398456, \"bayes_factor\": 1.004820207635184, \"comparison_vector_value\": null, \"m_probability\": null, \"u_probability\": null, \"bayes_factor_description\": null, \"value_l\": \"\", \"value_r\": \"\", \"term_frequency_adjustment\": null, \"bar_sort_order\": 4, \"record_number\": 0}, {\"column_name\": \"Prior\", \"label_for_charts\": \"Starting match weight (prior)\", \"sql_condition\": null, \"log2_bayes_factor\": -15.468019399518882, \"bayes_factor\": 2.206287920573635e-05, \"comparison_vector_value\": null, \"m_probability\": null, \"u_probability\": null, \"bayes_factor_description\": null, \"value_l\": \"\", \"value_r\": \"\", \"term_frequency_adjustment\": null, \"bar_sort_order\": 0, \"record_number\": 1}, {\"sql_condition\": \"(ABS(\\\"amount_l\\\" - \\\"amount_r\\\") / (CASE WHEN \\\"amount_r\\\" > \\\"amount_l\\\" THEN \\\"amount_r\\\" ELSE \\\"amount_l\\\" END)) < 0.01\", \"label_for_charts\": \"Percentage difference of 'amount' within 1.00%\", \"m_probability\": 0.19779258983250028, \"u_probability\": 0.0034510672745089684, \"bayes_factor\": 57.31345525874833, \"log2_bayes_factor\": 5.840801969581078, \"comparison_vector_value\": 4, \"bayes_factor_description\": \"If comparison level is `percentage difference of 'amount' within 1.00%` then comparison is 57.31 times more likely to be a match\", \"column_name\": \"amount\", \"value_l\": \"121.37\", \"value_r\": \"121.19\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 1, \"record_number\": 1}, {\"sql_condition\": \"levenshtein(\\\"memo_l\\\", \\\"memo_r\\\") <= 2\", \"label_for_charts\": \"Levenshtein distance of memo <= 2\", \"m_probability\": 0.10817684036314526, \"u_probability\": 0.0021962259404209043, \"bayes_factor\": 49.255788474301184, \"log2_bayes_factor\": 5.622221373008662, \"comparison_vector_value\": 3, \"bayes_factor_description\": \"If comparison level is `levenshtein distance of memo <= 2` then comparison is 49.26 times more likely to be a match\", \"column_name\": \"memo\", \"value_l\": \"E J BGC\", \"value_r\": \"Z H BGC\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 2, \"record_number\": 1}, {\"sql_condition\": \"transaction_date_r - transaction_date_l <= 4 and transaction_date_r >= transaction_date_l\", \"label_for_charts\": \"<=4 days\", \"m_probability\": 0.466140837509571, \"u_probability\": 0.028893812222174884, \"bayes_factor\": 16.132894957759362, \"log2_bayes_factor\": 4.011933440166542, \"comparison_vector_value\": 3, \"bayes_factor_description\": \"If comparison level is `<=4 days` then comparison is 16.13 times more likely to be a match\", \"column_name\": \"transaction_date\", \"value_l\": \"2022-03-27 00:00:00\", \"value_r\": \"2022-03-31 00:00:00\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 3, \"record_number\": 1}, {\"column_name\": \"Final score\", \"label_for_charts\": \"Final score\", \"sql_condition\": null, \"log2_bayes_factor\": 0.006937383237398456, \"bayes_factor\": 1.004820207635184, \"comparison_vector_value\": null, \"m_probability\": null, \"u_probability\": null, \"bayes_factor_description\": null, \"value_l\": \"\", \"value_r\": \"\", \"term_frequency_adjustment\": null, \"bar_sort_order\": 4, \"record_number\": 1}, {\"column_name\": \"Prior\", \"label_for_charts\": \"Starting match weight (prior)\", \"sql_condition\": null, \"log2_bayes_factor\": -15.468019399518882, \"bayes_factor\": 2.206287920573635e-05, \"comparison_vector_value\": null, \"m_probability\": null, \"u_probability\": null, \"bayes_factor_description\": null, \"value_l\": \"\", \"value_r\": \"\", \"term_frequency_adjustment\": null, \"bar_sort_order\": 0, \"record_number\": 2}, {\"sql_condition\": \"(ABS(\\\"amount_l\\\" - \\\"amount_r\\\") / (CASE WHEN \\\"amount_r\\\" > \\\"amount_l\\\" THEN \\\"amount_r\\\" ELSE \\\"amount_l\\\" END)) < 0.01\", \"label_for_charts\": \"Percentage difference of 'amount' within 1.00%\", \"m_probability\": 0.19779258983250028, \"u_probability\": 0.0034510672745089684, \"bayes_factor\": 57.31345525874833, \"log2_bayes_factor\": 5.840801969581078, \"comparison_vector_value\": 4, \"bayes_factor_description\": \"If comparison level is `percentage difference of 'amount' within 1.00%` then comparison is 57.31 times more likely to be a match\", \"column_name\": \"amount\", \"value_l\": \"143.45\", \"value_r\": \"142.04\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 1, \"record_number\": 2}, {\"sql_condition\": \"levenshtein(\\\"memo_l\\\", \\\"memo_r\\\") <= 2\", \"label_for_charts\": \"Levenshtein distance of memo <= 2\", \"m_probability\": 0.10817684036314526, \"u_probability\": 0.0021962259404209043, \"bayes_factor\": 49.255788474301184, \"log2_bayes_factor\": 5.622221373008662, \"comparison_vector_value\": 3, \"bayes_factor_description\": \"If comparison level is `levenshtein distance of memo <= 2` then comparison is 49.26 times more likely to be a match\", \"column_name\": \"memo\", \"value_l\": \"N R\", \"value_r\": \"D B\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 2, \"record_number\": 2}, {\"sql_condition\": \"transaction_date_r - transaction_date_l <= 4 and transaction_date_r >= transaction_date_l\", \"label_for_charts\": \"<=4 days\", \"m_probability\": 0.466140837509571, \"u_probability\": 0.028893812222174884, \"bayes_factor\": 16.132894957759362, \"log2_bayes_factor\": 4.011933440166542, \"comparison_vector_value\": 3, \"bayes_factor_description\": \"If comparison level is `<=4 days` then comparison is 16.13 times more likely to be a match\", \"column_name\": \"transaction_date\", \"value_l\": \"2022-04-27 00:00:00\", \"value_r\": \"2022-05-01 00:00:00\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 3, \"record_number\": 2}, {\"column_name\": \"Final score\", \"label_for_charts\": \"Final score\", \"sql_condition\": null, \"log2_bayes_factor\": 0.006937383237398456, \"bayes_factor\": 1.004820207635184, \"comparison_vector_value\": null, \"m_probability\": null, \"u_probability\": null, \"bayes_factor_description\": null, \"value_l\": \"\", \"value_r\": \"\", \"term_frequency_adjustment\": null, \"bar_sort_order\": 4, \"record_number\": 2}, {\"column_name\": \"Prior\", \"label_for_charts\": \"Starting match weight (prior)\", \"sql_condition\": null, \"log2_bayes_factor\": -15.468019399518882, \"bayes_factor\": 2.206287920573635e-05, \"comparison_vector_value\": null, \"m_probability\": null, \"u_probability\": null, \"bayes_factor_description\": null, \"value_l\": \"\", \"value_r\": \"\", \"term_frequency_adjustment\": null, \"bar_sort_order\": 0, \"record_number\": 3}, {\"sql_condition\": \"(ABS(\\\"amount_l\\\" - \\\"amount_r\\\") / (CASE WHEN \\\"amount_r\\\" > \\\"amount_l\\\" THEN \\\"amount_r\\\" ELSE \\\"amount_l\\\" END)) < 0.01\", \"label_for_charts\": \"Percentage difference of 'amount' within 1.00%\", \"m_probability\": 0.19779258983250028, \"u_probability\": 0.0034510672745089684, \"bayes_factor\": 57.31345525874833, \"log2_bayes_factor\": 5.840801969581078, \"comparison_vector_value\": 4, \"bayes_factor_description\": \"If comparison level is `percentage difference of 'amount' within 1.00%` then comparison is 57.31 times more likely to be a match\", \"column_name\": \"amount\", \"value_l\": \"37.67\", \"value_r\": \"37.97\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 1, \"record_number\": 3}, {\"sql_condition\": \"levenshtein(\\\"memo_l\\\", \\\"memo_r\\\") <= 2\", \"label_for_charts\": \"Levenshtein distance of memo <= 2\", \"m_probability\": 0.10817684036314526, \"u_probability\": 0.0021962259404209043, \"bayes_factor\": 49.255788474301184, \"log2_bayes_factor\": 5.622221373008662, \"comparison_vector_value\": 3, \"bayes_factor_description\": \"If comparison level is `levenshtein distance of memo <= 2` then comparison is 49.26 times more likely to be a match\", \"column_name\": \"memo\", \"value_l\": \"G M BGC\", \"value_r\": \"F S BGC\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 2, \"record_number\": 3}, {\"sql_condition\": \"transaction_date_r - transaction_date_l <= 4 and transaction_date_r >= transaction_date_l\", \"label_for_charts\": \"<=4 days\", \"m_probability\": 0.466140837509571, \"u_probability\": 0.028893812222174884, \"bayes_factor\": 16.132894957759362, \"log2_bayes_factor\": 4.011933440166542, \"comparison_vector_value\": 3, \"bayes_factor_description\": \"If comparison level is `<=4 days` then comparison is 16.13 times more likely to be a match\", \"column_name\": \"transaction_date\", \"value_l\": \"2022-05-05 00:00:00\", \"value_r\": \"2022-05-09 00:00:00\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 3, \"record_number\": 3}, {\"column_name\": \"Final score\", \"label_for_charts\": \"Final score\", \"sql_condition\": null, \"log2_bayes_factor\": 0.006937383237398456, \"bayes_factor\": 1.004820207635184, \"comparison_vector_value\": null, \"m_probability\": null, \"u_probability\": null, \"bayes_factor_description\": null, \"value_l\": \"\", \"value_r\": \"\", \"term_frequency_adjustment\": null, \"bar_sort_order\": 4, \"record_number\": 3}, {\"column_name\": \"Prior\", \"label_for_charts\": \"Starting match weight (prior)\", \"sql_condition\": null, \"log2_bayes_factor\": -15.468019399518882, \"bayes_factor\": 2.206287920573635e-05, \"comparison_vector_value\": null, \"m_probability\": null, \"u_probability\": null, \"bayes_factor_description\": null, \"value_l\": \"\", \"value_r\": \"\", \"term_frequency_adjustment\": null, \"bar_sort_order\": 0, \"record_number\": 4}, {\"sql_condition\": \"(ABS(\\\"amount_l\\\" - \\\"amount_r\\\") / (CASE WHEN \\\"amount_r\\\" > \\\"amount_l\\\" THEN \\\"amount_r\\\" ELSE \\\"amount_l\\\" END)) < 0.01\", \"label_for_charts\": \"Percentage difference of 'amount' within 1.00%\", \"m_probability\": 0.19779258983250028, \"u_probability\": 0.0034510672745089684, \"bayes_factor\": 57.31345525874833, \"log2_bayes_factor\": 5.840801969581078, \"comparison_vector_value\": 4, \"bayes_factor_description\": \"If comparison level is `percentage difference of 'amount' within 1.00%` then comparison is 57.31 times more likely to be a match\", \"column_name\": \"amount\", \"value_l\": \"304.71\", \"value_r\": \"303.74\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 1, \"record_number\": 4}, {\"sql_condition\": \"levenshtein(\\\"memo_l\\\", \\\"memo_r\\\") <= 2\", \"label_for_charts\": \"Levenshtein distance of memo <= 2\", \"m_probability\": 0.10817684036314526, \"u_probability\": 0.0021962259404209043, \"bayes_factor\": 49.255788474301184, \"log2_bayes_factor\": 5.622221373008662, \"comparison_vector_value\": 3, \"bayes_factor_description\": \"If comparison level is `levenshtein distance of memo <= 2` then comparison is 49.26 times more likely to be a match\", \"column_name\": \"memo\", \"value_l\": \"W H payment BGC\", \"value_r\": \"R B payment BGC\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 2, \"record_number\": 4}, {\"sql_condition\": \"transaction_date_r - transaction_date_l <= 4 and transaction_date_r >= transaction_date_l\", \"label_for_charts\": \"<=4 days\", \"m_probability\": 0.466140837509571, \"u_probability\": 0.028893812222174884, \"bayes_factor\": 16.132894957759362, \"log2_bayes_factor\": 4.011933440166542, \"comparison_vector_value\": 3, \"bayes_factor_description\": \"If comparison level is `<=4 days` then comparison is 16.13 times more likely to be a match\", \"column_name\": \"transaction_date\", \"value_l\": \"2022-05-08 00:00:00\", \"value_r\": \"2022-05-11 00:00:00\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 3, \"record_number\": 4}, {\"column_name\": \"Final score\", \"label_for_charts\": \"Final score\", \"sql_condition\": null, \"log2_bayes_factor\": 0.006937383237398456, \"bayes_factor\": 1.004820207635184, \"comparison_vector_value\": null, \"m_probability\": null, \"u_probability\": null, \"bayes_factor_description\": null, \"value_l\": \"\", \"value_r\": \"\", \"term_frequency_adjustment\": null, \"bar_sort_order\": 4, \"record_number\": 4}]}}, {\"mode\": \"vega-lite\"});\n",
              "</script>"
            ],
            "text/plain": [
              "alt.LayerChart(...)"
            ]
          },
          "execution_count": 13,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "pred_errors = linker.evaluation.prediction_errors_from_labels_column(\n",
        "    \"ground_truth\", include_false_positives=True, include_false_negatives=False\n",
        ")\n",
        "linker.visualisations.waterfall_chart(pred_errors.as_record_dict(limit=5))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2024-06-07T09:23:04.257242Z",
          "iopub.status.busy": "2024-06-07T09:23:04.257017Z",
          "iopub.status.idle": "2024-06-07T09:23:05.029715Z",
          "shell.execute_reply": "2024-06-07T09:23:05.029153Z"
        }
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "\n",
              "<style>\n",
              "  #altair-viz-3457041e8e6e4611a130d83fe67e8edf.vega-embed {\n",
              "    width: 100%;\n",
              "    display: flex;\n",
              "  }\n",
              "\n",
              "  #altair-viz-3457041e8e6e4611a130d83fe67e8edf.vega-embed details,\n",
              "  #altair-viz-3457041e8e6e4611a130d83fe67e8edf.vega-embed details summary {\n",
              "    position: relative;\n",
              "  }\n",
              "</style>\n",
              "<div id=\"altair-viz-3457041e8e6e4611a130d83fe67e8edf\"></div>\n",
              "<script type=\"text/javascript\">\n",
              "  var VEGA_DEBUG = (typeof VEGA_DEBUG == \"undefined\") ? {} : VEGA_DEBUG;\n",
              "  (function(spec, embedOpt){\n",
              "    let outputDiv = document.currentScript.previousElementSibling;\n",
              "    if (outputDiv.id !== \"altair-viz-3457041e8e6e4611a130d83fe67e8edf\") {\n",
              "      outputDiv = document.getElementById(\"altair-viz-3457041e8e6e4611a130d83fe67e8edf\");\n",
              "    }\n",
              "    const paths = {\n",
              "      \"vega\": \"https://cdn.jsdelivr.net/npm/vega@5?noext\",\n",
              "      \"vega-lib\": \"https://cdn.jsdelivr.net/npm/vega-lib?noext\",\n",
              "      \"vega-lite\": \"https://cdn.jsdelivr.net/npm/vega-lite@5.17.0?noext\",\n",
              "      \"vega-embed\": \"https://cdn.jsdelivr.net/npm/vega-embed@6?noext\",\n",
              "    };\n",
              "\n",
              "    function maybeLoadScript(lib, version) {\n",
              "      var key = `${lib.replace(\"-\", \"\")}_version`;\n",
              "      return (VEGA_DEBUG[key] == version) ?\n",
              "        Promise.resolve(paths[lib]) :\n",
              "        new Promise(function(resolve, reject) {\n",
              "          var s = document.createElement('script');\n",
              "          document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
              "          s.async = true;\n",
              "          s.onload = () => {\n",
              "            VEGA_DEBUG[key] = version;\n",
              "            return resolve(paths[lib]);\n",
              "          };\n",
              "          s.onerror = () => reject(`Error loading script: ${paths[lib]}`);\n",
              "          s.src = paths[lib];\n",
              "        });\n",
              "    }\n",
              "\n",
              "    function showError(err) {\n",
              "      outputDiv.innerHTML = `<div class=\"error\" style=\"color:red;\">${err}</div>`;\n",
              "      throw err;\n",
              "    }\n",
              "\n",
              "    function displayChart(vegaEmbed) {\n",
              "      vegaEmbed(outputDiv, spec, embedOpt)\n",
              "        .catch(err => showError(`Javascript Error: ${err.message}<br>This usually means there's a typo in your chart specification. See the javascript console for the full traceback.`));\n",
              "    }\n",
              "\n",
              "    if(typeof define === \"function\" && define.amd) {\n",
              "      requirejs.config({paths});\n",
              "      require([\"vega-embed\"], displayChart, err => showError(`Error loading script: ${err.message}`));\n",
              "    } else {\n",
              "      maybeLoadScript(\"vega\", \"5\")\n",
              "        .then(() => maybeLoadScript(\"vega-lite\", \"5.17.0\"))\n",
              "        .then(() => maybeLoadScript(\"vega-embed\", \"6\"))\n",
              "        .catch(showError)\n",
              "        .then(() => displayChart(vegaEmbed));\n",
              "    }\n",
              "  })({\"config\": {\"view\": {\"continuousWidth\": 400, \"continuousHeight\": 300}}, \"layer\": [{\"layer\": [{\"mark\": \"rule\", \"encoding\": {\"color\": {\"value\": \"black\"}, \"size\": {\"value\": 0.5}, \"y\": {\"field\": \"zero\", \"type\": \"quantitative\"}}}, {\"mark\": {\"type\": \"bar\", \"width\": 60}, \"encoding\": {\"color\": {\"condition\": {\"test\": \"(datum.log2_bayes_factor < 0)\", \"value\": \"red\"}, \"value\": \"green\"}, \"opacity\": {\"condition\": {\"test\": \"datum.column_name == 'Prior match weight' || datum.column_name == 'Final score'\", \"value\": 1}, \"value\": 0.5}, \"tooltip\": [{\"field\": \"column_name\", \"title\": \"Comparison column\", \"type\": \"nominal\"}, {\"field\": \"value_l\", \"title\": \"Value (L)\", \"type\": \"nominal\"}, {\"field\": \"value_r\", \"title\": \"Value (R)\", \"type\": \"nominal\"}, {\"field\": \"label_for_charts\", \"title\": \"Label\", \"type\": \"ordinal\"}, {\"field\": \"sql_condition\", \"title\": \"SQL condition\", \"type\": \"nominal\"}, {\"field\": \"comparison_vector_value\", \"title\": \"Comparison vector value\", \"type\": \"nominal\"}, {\"field\": \"bayes_factor\", \"format\": \",.4f\", \"title\": \"Bayes factor = m/u\", \"type\": \"quantitative\"}, {\"field\": \"log2_bayes_factor\", \"format\": \",.4f\", \"title\": \"Match weight = log2(m/u)\", \"type\": \"quantitative\"}, {\"field\": \"prob\", \"format\": \".4f\", \"title\": \"Cumulative match probability\", \"type\": \"quantitative\"}, {\"field\": \"bayes_factor_description\", \"title\": \"Match weight description\", \"type\": \"nominal\"}], \"x\": {\"axis\": {\"grid\": true, \"labelAlign\": \"center\", \"labelAngle\": -20, \"labelExpr\": \"datum.value == 'Prior' || datum.value == 'Final score' ? '' : datum.value\", \"labelPadding\": 10, \"tickBand\": \"extent\", \"title\": \"Column\"}, \"field\": \"column_name\", \"sort\": {\"field\": \"bar_sort_order\", \"order\": \"ascending\"}, \"type\": \"nominal\"}, \"y\": {\"axis\": {\"grid\": false, \"orient\": \"left\", \"title\": \"Match Weight\"}, \"field\": \"previous_sum\", \"type\": \"quantitative\"}, \"y2\": {\"field\": \"sum\"}}}, {\"mark\": {\"type\": \"text\", \"fontWeight\": \"bold\"}, \"encoding\": {\"color\": {\"value\": \"white\"}, \"text\": {\"condition\": {\"test\": \"abs(datum.log2_bayes_factor) > 1\", \"field\": \"log2_bayes_factor\", \"format\": \".2f\", \"type\": \"nominal\"}, \"value\": \"\"}, \"x\": {\"axis\": {\"labelAngle\": -20, \"title\": \"Column\"}, \"field\": \"column_name\", \"sort\": {\"field\": \"bar_sort_order\", \"order\": \"ascending\"}, \"type\": \"nominal\"}, \"y\": {\"axis\": {\"orient\": \"left\"}, \"field\": \"center\", \"type\": \"quantitative\"}}}, {\"mark\": {\"type\": \"text\", \"baseline\": \"bottom\", \"dy\": -25, \"fontWeight\": \"bold\"}, \"encoding\": {\"color\": {\"value\": \"black\"}, \"text\": {\"field\": \"column_name\", \"type\": \"nominal\"}, \"x\": {\"axis\": {\"labelAngle\": -20, \"title\": \"Column\"}, \"field\": \"column_name\", \"sort\": {\"field\": \"bar_sort_order\", \"order\": \"ascending\"}, \"type\": \"nominal\"}, \"y\": {\"field\": \"sum_top\", \"type\": \"quantitative\"}}}, {\"mark\": {\"type\": \"text\", \"baseline\": \"bottom\", \"dy\": -13, \"fontSize\": 8}, \"encoding\": {\"color\": {\"value\": \"grey\"}, \"text\": {\"field\": \"value_l\", \"type\": \"nominal\"}, \"x\": {\"axis\": {\"labelAngle\": -20, \"title\": \"Column\"}, \"field\": \"column_name\", \"sort\": {\"field\": \"bar_sort_order\", \"order\": \"ascending\"}, \"type\": \"nominal\"}, \"y\": {\"field\": \"sum_top\", \"type\": \"quantitative\"}}}, {\"mark\": {\"type\": \"text\", \"baseline\": \"bottom\", \"dy\": -5, \"fontSize\": 8}, \"encoding\": {\"color\": {\"value\": \"grey\"}, \"text\": {\"field\": \"value_r\", \"type\": \"nominal\"}, \"x\": {\"axis\": {\"labelAngle\": -20, \"title\": \"Column\"}, \"field\": \"column_name\", \"sort\": {\"field\": \"bar_sort_order\", \"order\": \"ascending\"}, \"type\": \"nominal\"}, \"y\": {\"field\": \"sum_top\", \"type\": \"quantitative\"}}}]}, {\"mark\": {\"type\": \"rule\", \"color\": \"black\", \"strokeWidth\": 2, \"x2Offset\": 30, \"xOffset\": -30}, \"encoding\": {\"x\": {\"axis\": {\"labelAngle\": -20, \"title\": \"Column\"}, \"field\": \"column_name\", \"sort\": {\"field\": \"bar_sort_order\", \"order\": \"ascending\"}, \"type\": \"nominal\"}, \"x2\": {\"field\": \"lead\"}, \"y\": {\"axis\": {\"labelExpr\": \"format(1 / (1 + pow(2, -1*datum.value)), '.2r')\", \"orient\": \"right\", \"title\": \"Probability\"}, \"field\": \"sum\", \"scale\": {\"zero\": false}, \"type\": \"quantitative\"}}}], \"data\": {\"name\": \"data-872e647eb63cc6ba9bf0b227e6f1f833\"}, \"height\": 450, \"params\": [{\"name\": \"record_number\", \"bind\": {\"input\": \"range\", \"max\": 4, \"min\": 0, \"step\": 1}, \"value\": 0}], \"resolve\": {\"axis\": {\"y\": \"independent\"}}, \"title\": {\"text\": \"Match weights waterfall chart\", \"subtitle\": \"How each comparison contributes to the final match score\"}, \"transform\": [{\"filter\": \"(datum.record_number == record_number)\"}, {\"filter\": \"(datum.bayes_factor !== 1.0)\"}, {\"window\": [{\"op\": \"sum\", \"field\": \"log2_bayes_factor\", \"as\": \"sum\"}, {\"op\": \"lead\", \"field\": \"column_name\", \"as\": \"lead\"}], \"frame\": [null, 0]}, {\"calculate\": \"datum.column_name === \\\"Final score\\\" ? datum.sum - datum.log2_bayes_factor : datum.sum\", \"as\": \"sum\"}, {\"calculate\": \"datum.lead === null ? datum.column_name : datum.lead\", \"as\": \"lead\"}, {\"calculate\": \"datum.column_name === \\\"Final score\\\" || datum.column_name === \\\"Prior match weight\\\" ? 0 : datum.sum - datum.log2_bayes_factor\", \"as\": \"previous_sum\"}, {\"calculate\": \"datum.sum > datum.previous_sum ? datum.column_name : \\\"\\\"\", \"as\": \"top_label\"}, {\"calculate\": \"datum.sum < datum.previous_sum ? datum.column_name : \\\"\\\"\", \"as\": \"bottom_label\"}, {\"calculate\": \"datum.sum > datum.previous_sum ? datum.sum : datum.previous_sum\", \"as\": \"sum_top\"}, {\"calculate\": \"datum.sum < datum.previous_sum ? datum.sum : datum.previous_sum\", \"as\": \"sum_bottom\"}, {\"calculate\": \"(datum.sum + datum.previous_sum) / 2\", \"as\": \"center\"}, {\"calculate\": \"(datum.log2_bayes_factor > 0 ? \\\"+\\\" : \\\"\\\") + datum.log2_bayes_factor\", \"as\": \"text_log2_bayes_factor\"}, {\"calculate\": \"datum.sum < datum.previous_sum ? 4 : -4\", \"as\": \"dy\"}, {\"calculate\": \"datum.sum < datum.previous_sum ? \\\"top\\\" : \\\"bottom\\\"\", \"as\": \"baseline\"}, {\"calculate\": \"1. / (1 + pow(2, -1.*datum.sum))\", \"as\": \"prob\"}, {\"calculate\": \"0*datum.sum\", \"as\": \"zero\"}], \"width\": {\"step\": 75}, \"$schema\": \"https://vega.github.io/schema/vega-lite/v5.9.3.json\", \"datasets\": {\"data-872e647eb63cc6ba9bf0b227e6f1f833\": [{\"column_name\": \"Prior\", \"label_for_charts\": \"Starting match weight (prior)\", \"sql_condition\": null, \"log2_bayes_factor\": -15.468019399518882, \"bayes_factor\": 2.206287920573635e-05, \"comparison_vector_value\": null, \"m_probability\": null, \"u_probability\": null, \"bayes_factor_description\": null, \"value_l\": \"\", \"value_r\": \"\", \"term_frequency_adjustment\": null, \"bar_sort_order\": 0, \"record_number\": 0}, {\"sql_condition\": \"(ABS(\\\"amount_l\\\" - \\\"amount_r\\\") / (CASE WHEN \\\"amount_r\\\" > \\\"amount_l\\\" THEN \\\"amount_r\\\" ELSE \\\"amount_l\\\" END)) < 0.3\", \"label_for_charts\": \"Percentage difference of 'amount' within 30.00%\", \"m_probability\": 0.0007603567508388371, \"u_probability\": 0.08587395590505811, \"bayes_factor\": 0.008854334737757794, \"log2_bayes_factor\": -6.819400369169093, \"comparison_vector_value\": 1, \"bayes_factor_description\": \"If comparison level is `percentage difference of 'amount' within 30.00%` then comparison is  112.94 times less likely to be a match\", \"column_name\": \"amount\", \"value_l\": \"31.6\", \"value_r\": \"35.57\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 1, \"record_number\": 0}, {\"sql_condition\": \"levenshtein(\\\"memo_l\\\", \\\"memo_r\\\") <= 10\", \"label_for_charts\": \"Levenshtein distance of memo <= 10\", \"m_probability\": 0.14779726216531575, \"u_probability\": 0.09805717694175792, \"bayes_factor\": 1.507255937554693, \"log2_bayes_factor\": 0.5919244126159111, \"comparison_vector_value\": 1, \"bayes_factor_description\": \"If comparison level is `levenshtein distance of memo <= 10` then comparison is 1.51 times more likely to be a match\", \"column_name\": \"memo\", \"value_l\": \"T J. SKINNER mo\", \"value_r\": \"T J. SKINNER money CHQ\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 2, \"record_number\": 0}, {\"sql_condition\": \"transaction_date_r - transaction_date_l <= 30 and transaction_date_r >= transaction_date_l\", \"label_for_charts\": \"<=30 days\", \"m_probability\": 0.04864613690105879, \"u_probability\": 0.16279157055008106, \"bayes_factor\": 0.2988246672520021, \"log2_bayes_factor\": -1.7426288508657286, \"comparison_vector_value\": 1, \"bayes_factor_description\": \"If comparison level is `<=30 days` then comparison is  3.35 times less likely to be a match\", \"column_name\": \"transaction_date\", \"value_l\": \"2022-03-31 00:00:00\", \"value_r\": \"2022-04-19 00:00:00\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 3, \"record_number\": 0}, {\"column_name\": \"Final score\", \"label_for_charts\": \"Final score\", \"sql_condition\": null, \"log2_bayes_factor\": -23.438124206937793, \"bayes_factor\": 8.798762022263209e-08, \"comparison_vector_value\": null, \"m_probability\": null, \"u_probability\": null, \"bayes_factor_description\": null, \"value_l\": \"\", \"value_r\": \"\", \"term_frequency_adjustment\": null, \"bar_sort_order\": 4, \"record_number\": 0}, {\"column_name\": \"Prior\", \"label_for_charts\": \"Starting match weight (prior)\", \"sql_condition\": null, \"log2_bayes_factor\": -15.468019399518882, \"bayes_factor\": 2.206287920573635e-05, \"comparison_vector_value\": null, \"m_probability\": null, \"u_probability\": null, \"bayes_factor_description\": null, \"value_l\": \"\", \"value_r\": \"\", \"term_frequency_adjustment\": null, \"bar_sort_order\": 0, \"record_number\": 1}, {\"sql_condition\": \"(ABS(\\\"amount_l\\\" - \\\"amount_r\\\") / (CASE WHEN \\\"amount_r\\\" > \\\"amount_l\\\" THEN \\\"amount_r\\\" ELSE \\\"amount_l\\\" END)) < 0.3\", \"label_for_charts\": \"Percentage difference of 'amount' within 30.00%\", \"m_probability\": 0.0007603567508388371, \"u_probability\": 0.08587395590505811, \"bayes_factor\": 0.008854334737757794, \"log2_bayes_factor\": -6.819400369169093, \"comparison_vector_value\": 1, \"bayes_factor_description\": \"If comparison level is `percentage difference of 'amount' within 30.00%` then comparison is  112.94 times less likely to be a match\", \"column_name\": \"amount\", \"value_l\": \"11756.59\", \"value_r\": \"13069.32\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 1, \"record_number\": 1}, {\"sql_condition\": \"ELSE\", \"label_for_charts\": \"All other comparisons\", \"m_probability\": 0.058106596256656165, \"u_probability\": 0.8723247127661086, \"bayes_factor\": 0.06661120040082592, \"log2_bayes_factor\": -3.9080914088124747, \"comparison_vector_value\": 0, \"bayes_factor_description\": \"If comparison level is `all other comparisons` then comparison is  15.01 times less likely to be a match\", \"column_name\": \"memo\", \"value_l\": \"P GODEFFROY payment WRE\", \"value_r\": \"P GODEFFROY\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 2, \"record_number\": 1}, {\"sql_condition\": \"transaction_date_r - transaction_date_l <= 1 and transaction_date_r >= transaction_date_l\", \"label_for_charts\": \"1 day\", \"m_probability\": 0.38903576440838894, \"u_probability\": 0.01929737000675606, \"bayes_factor\": 20.160040682859194, \"log2_bayes_factor\": 4.333426645079358, \"comparison_vector_value\": 4, \"bayes_factor_description\": \"If comparison level is `1 day` then comparison is 20.16 times more likely to be a match\", \"column_name\": \"transaction_date\", \"value_l\": \"2022-04-10 00:00:00\", \"value_r\": \"2022-04-11 00:00:00\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 3, \"record_number\": 1}, {\"column_name\": \"Final score\", \"label_for_charts\": \"Final score\", \"sql_condition\": null, \"log2_bayes_factor\": -21.862084532421093, \"bayes_factor\": 2.6233533294694627e-07, \"comparison_vector_value\": null, \"m_probability\": null, \"u_probability\": null, \"bayes_factor_description\": null, \"value_l\": \"\", \"value_r\": \"\", \"term_frequency_adjustment\": null, \"bar_sort_order\": 4, \"record_number\": 1}, {\"column_name\": \"Prior\", \"label_for_charts\": \"Starting match weight (prior)\", \"sql_condition\": null, \"log2_bayes_factor\": -15.468019399518882, \"bayes_factor\": 2.206287920573635e-05, \"comparison_vector_value\": null, \"m_probability\": null, \"u_probability\": null, \"bayes_factor_description\": null, \"value_l\": \"\", \"value_r\": \"\", \"term_frequency_adjustment\": null, \"bar_sort_order\": 0, \"record_number\": 2}, {\"sql_condition\": \"(ABS(\\\"amount_l\\\" - \\\"amount_r\\\") / (CASE WHEN \\\"amount_r\\\" > \\\"amount_l\\\" THEN \\\"amount_r\\\" ELSE \\\"amount_l\\\" END)) < 0.3\", \"label_for_charts\": \"Percentage difference of 'amount' within 30.00%\", \"m_probability\": 0.0007603567508388371, \"u_probability\": 0.08587395590505811, \"bayes_factor\": 0.008854334737757794, \"log2_bayes_factor\": -6.819400369169093, \"comparison_vector_value\": 1, \"bayes_factor_description\": \"If comparison level is `percentage difference of 'amount' within 30.00%` then comparison is  112.94 times less likely to be a match\", \"column_name\": \"amount\", \"value_l\": \"727.17\", \"value_r\": \"808.1\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 1, \"record_number\": 2}, {\"sql_condition\": \"levenshtein(\\\"memo_l\\\", \\\"memo_r\\\") <= 6\", \"label_for_charts\": \"Levenshtein distance of memo <= 6\", \"m_probability\": 0.2591582087321121, \"u_probability\": 0.02739956704423493, \"bayes_factor\": 9.458478242145832, \"log2_bayes_factor\": 3.241608089578434, \"comparison_vector_value\": 2, \"bayes_factor_description\": \"If comparison level is `levenshtein distance of memo <= 6` then comparison is 9.46 times more likely to be a match\", \"column_name\": \"memo\", \"value_l\": \"GIAMBATTISTA A \", \"value_r\": \"GIAMBATTISTA A  CHQ\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 2, \"record_number\": 2}, {\"sql_condition\": \"transaction_date_r - transaction_date_l <= 30 and transaction_date_r >= transaction_date_l\", \"label_for_charts\": \"<=30 days\", \"m_probability\": 0.04864613690105879, \"u_probability\": 0.16279157055008106, \"bayes_factor\": 0.2988246672520021, \"log2_bayes_factor\": -1.7426288508657286, \"comparison_vector_value\": 1, \"bayes_factor_description\": \"If comparison level is `<=30 days` then comparison is  3.35 times less likely to be a match\", \"column_name\": \"transaction_date\", \"value_l\": \"2022-04-15 00:00:00\", \"value_r\": \"2022-04-30 00:00:00\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 3, \"record_number\": 2}, {\"column_name\": \"Final score\", \"label_for_charts\": \"Final score\", \"sql_condition\": null, \"log2_bayes_factor\": -20.788440529975272, \"bayes_factor\": 5.521484246425513e-07, \"comparison_vector_value\": null, \"m_probability\": null, \"u_probability\": null, \"bayes_factor_description\": null, \"value_l\": \"\", \"value_r\": \"\", \"term_frequency_adjustment\": null, \"bar_sort_order\": 4, \"record_number\": 2}, {\"column_name\": \"Prior\", \"label_for_charts\": \"Starting match weight (prior)\", \"sql_condition\": null, \"log2_bayes_factor\": -15.468019399518882, \"bayes_factor\": 2.206287920573635e-05, \"comparison_vector_value\": null, \"m_probability\": null, \"u_probability\": null, \"bayes_factor_description\": null, \"value_l\": \"\", \"value_r\": \"\", \"term_frequency_adjustment\": null, \"bar_sort_order\": 0, \"record_number\": 3}, {\"sql_condition\": \"(ABS(\\\"amount_l\\\" - \\\"amount_r\\\") / (CASE WHEN \\\"amount_r\\\" > \\\"amount_l\\\" THEN \\\"amount_r\\\" ELSE \\\"amount_l\\\" END)) < 0.1\", \"label_for_charts\": \"Percentage difference of 'amount' within 10.00%\", \"m_probability\": 0.23149568734526935, \"u_probability\": 0.02613255263334084, \"bayes_factor\": 8.858517979216424, \"log2_bayes_factor\": 3.1470653575978704, \"comparison_vector_value\": 2, \"bayes_factor_description\": \"If comparison level is `percentage difference of 'amount' within 10.00%` then comparison is 8.86 times more likely to be a match\", \"column_name\": \"amount\", \"value_l\": \"26103.23\", \"value_r\": \"27036.92\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 1, \"record_number\": 3}, {\"sql_condition\": \"ELSE\", \"label_for_charts\": \"All other comparisons\", \"m_probability\": 0.058106596256656165, \"u_probability\": 0.8723247127661086, \"bayes_factor\": 0.06661120040082592, \"log2_bayes_factor\": -3.9080914088124747, \"comparison_vector_value\": 0, \"bayes_factor_description\": \"If comparison level is `all other comparisons` then comparison is  15.01 times less likely to be a match\", \"column_name\": \"memo\", \"value_l\": \"J H d761f46c WR\", \"value_r\": \"J H\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 2, \"record_number\": 3}, {\"sql_condition\": \"transaction_date_r - transaction_date_l <= 30 and transaction_date_r >= transaction_date_l\", \"label_for_charts\": \"<=30 days\", \"m_probability\": 0.04864613690105879, \"u_probability\": 0.16279157055008106, \"bayes_factor\": 0.2988246672520021, \"log2_bayes_factor\": -1.7426288508657286, \"comparison_vector_value\": 1, \"bayes_factor_description\": \"If comparison level is `<=30 days` then comparison is  3.35 times less likely to be a match\", \"column_name\": \"transaction_date\", \"value_l\": \"2022-03-13 00:00:00\", \"value_r\": \"2022-04-01 00:00:00\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 3, \"record_number\": 3}, {\"column_name\": \"Final score\", \"label_for_charts\": \"Final score\", \"sql_condition\": null, \"log2_bayes_factor\": -17.971674301599215, \"bayes_factor\": 3.890334664244009e-06, \"comparison_vector_value\": null, \"m_probability\": null, \"u_probability\": null, \"bayes_factor_description\": null, \"value_l\": \"\", \"value_r\": \"\", \"term_frequency_adjustment\": null, \"bar_sort_order\": 4, \"record_number\": 3}, {\"column_name\": \"Prior\", \"label_for_charts\": \"Starting match weight (prior)\", \"sql_condition\": null, \"log2_bayes_factor\": -15.468019399518882, \"bayes_factor\": 2.206287920573635e-05, \"comparison_vector_value\": null, \"m_probability\": null, \"u_probability\": null, \"bayes_factor_description\": null, \"value_l\": \"\", \"value_r\": \"\", \"term_frequency_adjustment\": null, \"bar_sort_order\": 0, \"record_number\": 4}, {\"sql_condition\": \"(ABS(\\\"amount_l\\\" - \\\"amount_r\\\") / (CASE WHEN \\\"amount_r\\\" > \\\"amount_l\\\" THEN \\\"amount_r\\\" ELSE \\\"amount_l\\\" END)) < 0.1\", \"label_for_charts\": \"Percentage difference of 'amount' within 10.00%\", \"m_probability\": 0.23149568734526935, \"u_probability\": 0.02613255263334084, \"bayes_factor\": 8.858517979216424, \"log2_bayes_factor\": 3.1470653575978704, \"comparison_vector_value\": 2, \"bayes_factor_description\": \"If comparison level is `percentage difference of 'amount' within 10.00%` then comparison is 8.86 times more likely to be a match\", \"column_name\": \"amount\", \"value_l\": \"63.84\", \"value_r\": \"67.64\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 1, \"record_number\": 4}, {\"sql_condition\": \"ELSE\", \"label_for_charts\": \"All other comparisons\", \"m_probability\": 0.058106596256656165, \"u_probability\": 0.8723247127661086, \"bayes_factor\": 0.06661120040082592, \"log2_bayes_factor\": -3.9080914088124747, \"comparison_vector_value\": 0, \"bayes_factor_description\": \"If comparison level is `all other comparisons` then comparison is  15.01 times less likely to be a match\", \"column_name\": \"memo\", \"value_l\": \"P G\", \"value_r\": \"P G donation BGC\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 2, \"record_number\": 4}, {\"sql_condition\": \"transaction_date_r - transaction_date_l <= 30 and transaction_date_r >= transaction_date_l\", \"label_for_charts\": \"<=30 days\", \"m_probability\": 0.04864613690105879, \"u_probability\": 0.16279157055008106, \"bayes_factor\": 0.2988246672520021, \"log2_bayes_factor\": -1.7426288508657286, \"comparison_vector_value\": 1, \"bayes_factor_description\": \"If comparison level is `<=30 days` then comparison is  3.35 times less likely to be a match\", \"column_name\": \"transaction_date\", \"value_l\": \"2022-03-16 00:00:00\", \"value_r\": \"2022-04-03 00:00:00\", \"term_frequency_adjustment\": false, \"bar_sort_order\": 3, \"record_number\": 4}, {\"column_name\": \"Final score\", \"label_for_charts\": \"Final score\", \"sql_condition\": null, \"log2_bayes_factor\": -17.971674301599215, \"bayes_factor\": 3.890334664244009e-06, \"comparison_vector_value\": null, \"m_probability\": null, \"u_probability\": null, \"bayes_factor_description\": null, \"value_l\": \"\", \"value_r\": \"\", \"term_frequency_adjustment\": null, \"bar_sort_order\": 4, \"record_number\": 4}]}}, {\"mode\": \"vega-lite\"});\n",
              "</script>"
            ],
            "text/plain": [
              "alt.LayerChart(...)"
            ]
          },
          "execution_count": 14,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "pred_errors = linker.evaluation.prediction_errors_from_labels_column(\n",
        "    \"ground_truth\", include_false_positives=False, include_false_negatives=True\n",
        ")\n",
        "linker.visualisations.waterfall_chart(pred_errors.as_record_dict(limit=5))"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.8"
    },
    "widgets": {
      "application/vnd.jupyter.widget-state+json": {
        "state": {
          "0cb4a943a08a42c7841ca32d466f9eed": {
            "model_module": "@jupyter-widgets/controls",
            "model_module_version": "2.0.0",
            "model_name": "FloatProgressModel",
            "state": {
              "_dom_classes": [],
              "_model_module": "@jupyter-widgets/controls",
              "_model_module_version": "2.0.0",
              "_model_name": "FloatProgressModel",
              "_view_count": null,
              "_view_module": "@jupyter-widgets/controls",
              "_view_module_version": "2.0.0",
              "_view_name": "ProgressView",
              "bar_style": "",
              "description": "",
              "description_allow_html": false,
              "layout": "IPY_MODEL_fd157120a2ca488496c737cec882713d",
              "max": 100,
              "min": 0,
              "orientation": "horizontal",
              "style": "IPY_MODEL_ed234594aea94bf98ffb67a51d3811f4",
              "tabbable": null,
              "tooltip": null,
              "value": 100
            }
          },
          "2bae68755fc34e38ac69e792f314ba8e": {
            "model_module": "@jupyter-widgets/controls",
            "model_module_version": "2.0.0",
            "model_name": "ProgressStyleModel",
            "state": {
              "_model_module": "@jupyter-widgets/controls",
              "_model_module_version": "2.0.0",
              "_model_name": "ProgressStyleModel",
              "_view_count": null,
              "_view_module": "@jupyter-widgets/base",
              "_view_module_version": "2.0.0",
              "_view_name": "StyleView",
              "bar_color": "black",
              "description_width": ""
            }
          },
          "4430006dcc174ff092d96adf68c301ff": {
            "model_module": "@jupyter-widgets/controls",
            "model_module_version": "2.0.0",
            "model_name": "FloatProgressModel",
            "state": {
              "_dom_classes": [],
              "_model_module": "@jupyter-widgets/controls",
              "_model_module_version": "2.0.0",
              "_model_name": "FloatProgressModel",
              "_view_count": null,
              "_view_module": "@jupyter-widgets/controls",
              "_view_module_version": "2.0.0",
              "_view_name": "ProgressView",
              "bar_style": "",
              "description": "",
              "description_allow_html": false,
              "layout": "IPY_MODEL_5c32bb2a7a714bd79accac15915b17e5",
              "max": 100,
              "min": 0,
              "orientation": "horizontal",
              "style": "IPY_MODEL_6222247c7cbe45b19cfeb9b182147a18",
              "tabbable": null,
              "tooltip": null,
              "value": 100
            }
          },
          "5c32bb2a7a714bd79accac15915b17e5": {
            "model_module": "@jupyter-widgets/base",
            "model_module_version": "2.0.0",
            "model_name": "LayoutModel",
            "state": {
              "_model_module": "@jupyter-widgets/base",
              "_model_module_version": "2.0.0",
              "_model_name": "LayoutModel",
              "_view_count": null,
              "_view_module": "@jupyter-widgets/base",
              "_view_module_version": "2.0.0",
              "_view_name": "LayoutView",
              "align_content": null,
              "align_items": null,
              "align_self": null,
              "border_bottom": null,
              "border_left": null,
              "border_right": null,
              "border_top": null,
              "bottom": null,
              "display": null,
              "flex": null,
              "flex_flow": null,
              "grid_area": null,
              "grid_auto_columns": null,
              "grid_auto_flow": null,
              "grid_auto_rows": null,
              "grid_column": null,
              "grid_gap": null,
              "grid_row": null,
              "grid_template_areas": null,
              "grid_template_columns": null,
              "grid_template_rows": null,
              "height": null,
              "justify_content": null,
              "justify_items": null,
              "left": null,
              "margin": null,
              "max_height": null,
              "max_width": null,
              "min_height": null,
              "min_width": null,
              "object_fit": null,
              "object_position": null,
              "order": null,
              "overflow": null,
              "padding": null,
              "right": null,
              "top": null,
              "visibility": null,
              "width": "auto"
            }
          },
          "6222247c7cbe45b19cfeb9b182147a18": {
            "model_module": "@jupyter-widgets/controls",
            "model_module_version": "2.0.0",
            "model_name": "ProgressStyleModel",
            "state": {
              "_model_module": "@jupyter-widgets/controls",
              "_model_module_version": "2.0.0",
              "_model_name": "ProgressStyleModel",
              "_view_count": null,
              "_view_module": "@jupyter-widgets/base",
              "_view_module_version": "2.0.0",
              "_view_name": "StyleView",
              "bar_color": "black",
              "description_width": ""
            }
          },
          "63719efff46e49ecba53edb438f35c3f": {
            "model_module": "@jupyter-widgets/controls",
            "model_module_version": "2.0.0",
            "model_name": "FloatProgressModel",
            "state": {
              "_dom_classes": [],
              "_model_module": "@jupyter-widgets/controls",
              "_model_module_version": "2.0.0",
              "_model_name": "FloatProgressModel",
              "_view_count": null,
              "_view_module": "@jupyter-widgets/controls",
              "_view_module_version": "2.0.0",
              "_view_name": "ProgressView",
              "bar_style": "",
              "description": "",
              "description_allow_html": false,
              "layout": "IPY_MODEL_921bb606e07743f7a252c05830098a57",
              "max": 100,
              "min": 0,
              "orientation": "horizontal",
              "style": "IPY_MODEL_2bae68755fc34e38ac69e792f314ba8e",
              "tabbable": null,
              "tooltip": null,
              "value": 100
            }
          },
          "921bb606e07743f7a252c05830098a57": {
            "model_module": "@jupyter-widgets/base",
            "model_module_version": "2.0.0",
            "model_name": "LayoutModel",
            "state": {
              "_model_module": "@jupyter-widgets/base",
              "_model_module_version": "2.0.0",
              "_model_name": "LayoutModel",
              "_view_count": null,
              "_view_module": "@jupyter-widgets/base",
              "_view_module_version": "2.0.0",
              "_view_name": "LayoutView",
              "align_content": null,
              "align_items": null,
              "align_self": null,
              "border_bottom": null,
              "border_left": null,
              "border_right": null,
              "border_top": null,
              "bottom": null,
              "display": null,
              "flex": null,
              "flex_flow": null,
              "grid_area": null,
              "grid_auto_columns": null,
              "grid_auto_flow": null,
              "grid_auto_rows": null,
              "grid_column": null,
              "grid_gap": null,
              "grid_row": null,
              "grid_template_areas": null,
              "grid_template_columns": null,
              "grid_template_rows": null,
              "height": null,
              "justify_content": null,
              "justify_items": null,
              "left": null,
              "margin": null,
              "max_height": null,
              "max_width": null,
              "min_height": null,
              "min_width": null,
              "object_fit": null,
              "object_position": null,
              "order": null,
              "overflow": null,
              "padding": null,
              "right": null,
              "top": null,
              "visibility": null,
              "width": "auto"
            }
          },
          "ed234594aea94bf98ffb67a51d3811f4": {
            "model_module": "@jupyter-widgets/controls",
            "model_module_version": "2.0.0",
            "model_name": "ProgressStyleModel",
            "state": {
              "_model_module": "@jupyter-widgets/controls",
              "_model_module_version": "2.0.0",
              "_model_name": "ProgressStyleModel",
              "_view_count": null,
              "_view_module": "@jupyter-widgets/base",
              "_view_module_version": "2.0.0",
              "_view_name": "StyleView",
              "bar_color": "black",
              "description_width": ""
            }
          },
          "fd157120a2ca488496c737cec882713d": {
            "model_module": "@jupyter-widgets/base",
            "model_module_version": "2.0.0",
            "model_name": "LayoutModel",
            "state": {
              "_model_module": "@jupyter-widgets/base",
              "_model_module_version": "2.0.0",
              "_model_name": "LayoutModel",
              "_view_count": null,
              "_view_module": "@jupyter-widgets/base",
              "_view_module_version": "2.0.0",
              "_view_name": "LayoutView",
              "align_content": null,
              "align_items": null,
              "align_self": null,
              "border_bottom": null,
              "border_left": null,
              "border_right": null,
              "border_top": null,
              "bottom": null,
              "display": null,
              "flex": null,
              "flex_flow": null,
              "grid_area": null,
              "grid_auto_columns": null,
              "grid_auto_flow": null,
              "grid_auto_rows": null,
              "grid_column": null,
              "grid_gap": null,
              "grid_row": null,
              "grid_template_areas": null,
              "grid_template_columns": null,
              "grid_template_rows": null,
              "height": null,
              "justify_content": null,
              "justify_items": null,
              "left": null,
              "margin": null,
              "max_height": null,
              "max_width": null,
              "min_height": null,
              "min_width": null,
              "object_fit": null,
              "object_position": null,
              "order": null,
              "overflow": null,
              "padding": null,
              "right": null,
              "top": null,
              "visibility": null,
              "width": "auto"
            }
          }
        },
        "version_major": 2,
        "version_minor": 0
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}